.Net Framework Collections Performance and The right Choice

Up NET Framework 4 introduces new data structures designed to simplify thread-safe access to shared data, and to increase the performance and scalability of multi-threaded applications. To best take advantage of these data structures, it helps to understand their performance characteristics in different scenarios.

Some of the Parallel Computing Team members (Emad Omara and Mike Liddell) measured the performance of four new concurrent collection types: ConcurrentQueue(T), ConcurrentStack(T), ConcurrentBag(T), and ConcurrentDictionary(TKey, TValue), and published their results on the Parallel Computing Developer Center on MSDN at http://msdn.microsoft.com/en-us/concurrency/ee851578.aspx.)

Concluding some of their findings like follows :

Collection

Ordering

Contiguous Storage?

Direct Access?

Lookup Efficiency

Manipulate

Efficiency

Dictionary

Unordered

Yes

Via Key

Key:

O(1)

O(1)

SortedDictionary Sorted No Via Key Key: 
O(log n)
O(log n)

SortedList

Sorted

Yes

Via Key

Key:

O(log n)

O(n)

List

User has precise control over element ordering

Yes

Via Index

Index: O(1)

Value: O(n)

O(n)

LinkedList

User has precise control over element ordering

No

No

Value:

O(n)

O(1)

HashSet

Unordered

Yes

Via Key

Key:

O(1)

O(1)

SortedSet

Sorted

No

Via Key

Key:

O(log n)

O(log n)

Stack

LIFO

Yes

Only Top

Top: O(1)

O(1)*

Queue

FIFO

Yes

Only Front

Front: O(1)

O(1)

1.Uses the implementation details in the data structures that relate to performance, although these details are subject to change in future releases.

2. to provide performance measurements that compare the data structures to alternatives and to measure the scalability of certain scenarios for different numbers of threads/cores

3. to provide best-practice guidance that will help answer questions such as “when should I use the new thread-safe collections,” and “what aspects of a scenario make a given type more scalable or better performing than others?”

Throughout the document, our best-practice guidance is called out in boxes like the following:

Use the new functionality in .NET Framework 4 to get the most out of your multi-core machines

First, we explain the performance measurements and the scenarios used for the majority of our analyses. We then discuss and analyze the performance of BlockingCollection(T), ConcurrentStack(T), ConcurrentQueue(T) , ConcurrentBag(T), and ConcurrentDictionary(T).

Our tests were run on a specific set of machines for which the configurations are described in the appendix. Our performance analyses were based primarily on the statistics of test completion time. We expect that the completion time will vary between runs and if the tests are run on different hardware. For this reason, the test results provided are to be used purely as a starting point of performance tuning for an application.

Performance Criteria

Thread-safe collections in .NET Framework 4 and their performance characteristics Page 1 We measured the performance of the thread-safe collections to answer two questions:

1. For an algorithm using one of the thread-safe collections on a machine with at least N cores, how much faster does it execute when using N threads as compared to just using 1 thread? This measurement is called scalability.

2. How much faster is an algorithm – if it utilizes one of the new thread-safe collections, as opposed to an equivalent algorithm that doesn’t use one of the new types? This measurement is called speedup.

The scalability investigations measured how the performance of a thread-safe collection varied as more cores were utilized.The aim of the speedup investigations were quite different as they compare two ways of solving a problem using different approaches. In these investigations we typically used the same machine configuration and ensured the programs were unchanged except for the replacement of a thread-safe collection with an alternative that was equivalent in functionality. Each speedup experiment defined a single algorithm that required a thread-safe collection to operate correctly on multi-core machines. The experiments were run using one of the new threadsafe collections and, again, were compared against a simple implementation of the same data structure. The simple implementations are straightforward approaches built using types available prior to .NET Framework 4 – in most cases, they are a synchronization wrapper around a non-thread-safe collection. For example, the following SynchronizedDictionary(TKey, TValue) was compared to ConcurrentDictionary(TKey, TValue):

public class SynchronizedDictionary<TKey, TValue> : IDictionary<TKey, TValue>
{
private Dictionary<TKey, TValue> m_Dictionary = new Dictionary<TKey, TValue>(); private object _sync = new object();
public void Add(TKey key, TValue value)
{
lock (_sync)
{
m_Dictionary.Add(key, value);
}
}
// and so on for the other operations..
}

Producer-Consumer Scenarios Used for Speedup Comparisons

For our experiments, we created scenarios that we feel represent common usage. The most common scenarios that apply to the new collections are variations of the producer-consumer pattern (Stephen Toub, 2009). In particular, ConcurrentStack(T), ConcurrentQueue(T), and ConcurrentBag(T) are often appropriate for use as a buffer between producers and consumers. BlockingCollection(T) simply supports the blocking and bounding requirements of some producer-consumer scenarios.

In particular, we used two producer-consumer scenarios: a pure scenario and a mixed scenario.

In the pure scenario, we created N threads on an N-core machine where N/2 threads only produced data and N/2 threads only consumed data. This basic scenario appears whenever generation of work items is logically separate to the processing of work items. For simplicity, in this scenario we assume equal number of producers and consumers and that they process each item with same speed. However in real world it is common that producers and consumers are not balanced.

The main loop for a producer thread typically looks like:

while (producingcondition())
{
var item = DoDummyWork(workload); //simulates work to create the item
collection.Add(item);
}

The main loop for a consumer thread typically looks like :

while (consumingcondition())
{
TItem item;
//TryTake returns true if item is removed successfully by the current
//thread; false otherwise if (collection.TryTake(out item))
{
DoDummyWork(workload); //simulates work to process the item
}
else
{
//spin, wait, or do some other operations.
}}

The mixed scenario represents situations where threads both produce and consume data. Consider a generic tree traversal algorithm over a family tree where each node is defined as:

public class Person
{ public string name; public int age;
public List<Person> children;
}

To traverse the tree, we can make use of any IProducerConsumerCollection(T) to act as a holding collection for subsequent nodes that must be traversed. The following code is a multi-threaded Traverse() method.

private static void Traverse(Person root,

IProducerConsumerCollection<Person> collection)
{
collection.TryAdd(root);
Task[] tasks = new Task[dop]; //typically dop=n for n-core machine int activeThreadsNumber = 0;
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = Task.Factory.StartNew(() =>
{
bool lastTimeMarkedFinished = false;
Interlocked.Increment(ref activeThreadsNumber);
while (true)
{
if (lastTimeMarkedFinished)
{
Interlocked.Increment(ref activeThreadsNumber); lastTimeMarkedFinished = false;
}
Person parent = null;
if (collection.TryTake(out parent))
{
foreach (Person child in parent.Children)
{
collection.TryAdd(child);
}
DoDummyWork(workload); //processing per node
} else {
if (!lastTimeMarkedFinished)
{
Interlocked.Decrement(ref activeThreadsNumber); lastTimeMarkedFinished = true;
}
if (activeThreadsNumber == 0) //all tasks finished
{ return;
}
}
}
});
}
Task.WaitAll(tasks);
}

In this program, each thread acts as both a producer and a consumer, adding and taking items from the shared collection.

In both the pure and mixed scenarios we simulated the work required to produce and consume items. We did this via a simple DoDummyWork(int k) function that repeats a simple floating point calculations k times. The exact details are not important but we can assume that each loop of the dummy function corresponds to a handful of simple machine instructions. Since we were interested in measuring the costs associated with accessing the data structures, we have typically used very small work functions.

BlockingCollection(T)

BlockingCollection(T) provides blocking and bounding semantics for any IProducerConsumerCollection(T) type. In this way, BlockingCollection(T) can add and remove elements using any underlying policy implemented by the backing collection while providing the mechanism to block on attempts to remove from an empty store and to block on attempts to add if a bound is specified. Prior to .NET Framework 4, this would most likely have been constructed using a Monitor with a collection type like Queue(T), as the following NaiveBlockingQueue(T):

class NaiveBlockingQueue<T>
{
Queue<T> m_queue; object m_lockObj;
public NaiveBlockingQueue()
{
m_queue = new Queue<T>(); m_lockObj = new object();
}
public void Add(T item)
{
lock (m_lockObj)
{
m_queue.Enqueue(item); Monitor.Pulse(m_lockObj);
}
}
public void Take(out T item)
{
lock (m_lockObj)
{
while (m_queue.Count == 0) Monitor.Wait(m_lockObj); item = m_queue.Dequeue();
}
}
}

This NaiveBlockingQueue(T) blocks Take operations if the underlying queue is empty, it does not support rich functionalities such as bounding for additions, waking readers once all production has finished, cancellation and pluggable underlying collections. BlockingCollection(T) supports all of these.

NaiveBlockingCollection(T) uses a simple Monitor while BlockingCollection(T) internally uses more complex synchronization methods. From the performance point of view, NaiveBlockingCollection(T) could perform better in scenarios where workload is zero or very small. But once more functionalities are added to

NaiveBlockingCollection(T), the performance starts to degrade with more overhead of synchronization.

To compare the performance of BlockingCollection(T) and NaiveBlockingQueue(T), we ran a test in which one thread produces all the elements and all the other threads concurrently consume these elements. We ran this test using 2, 4 and 8 threads with various workloads. With zero workload, BlockingCollection(T) performed worse than NaiveBlockingQueue(T) as expected, but as workload increased BlockingCollection(T) started to outperform NaiveBlockingQueue(T). We found the tipping point of the workload was 500 FLOPs in our test configuration, and this could vary according to different hardware. Figure 1 shows the elapsed time of this test running on an 8-core machine, for a variety of thread counts.

clip_image001

Number of Threads

Figure 1: Comparison of scalability for BlockingCollection(T) and NaiveBlockingQueue(T) in the scenario with 1 producer and N-1 consumers and workload of 500FLOPs

The real power of the BlockingCollection(T) type, however, is that the rich functionalities it supports we mentioned. Given that BlockingCollection(T) performs similar with or better than BlockingQueue(T) and that it has much richer functionality, it is appropriate to use BlockingCollection(T) whenever blocking and bounding semantics are required.

When blocking and bounding semantics are required, BlockingCollection(T) provides both rich functionality and good performance.

Other Considerations

Given a collection type, we may also be interested to get a count of items, to enumerate all the data, or to dump the data into another data structure. When using a BlockingCollection(T), it is useful to know the following performance characteristics.

The Count method relies on the synchronization mechanism of BlockingCollection(T), specifically the CurrentCount property of the SemaphoreSlim object it keeps internally. Thus Count is an O(1) operation, and it reflects the real count of the underlying collection, unless the underlying collection is modified outside the BlockingCollection(T), which is a bad practice that breaks the contract supported by BlockingCollection(T), and we advise against it.

BlockingCollection(T) also provides a GetConsumingEnumerable() method for the consumer to enumerate the collection, which, unlike normal enumerables, mutates the collection by repeatedly calling Take(). This means that calling MoveNext() on this enumerator will block if the collection is empty and wait for the new items to be added into the collection. To safely stop enumerating, you can call CompleteAdding() or cancel the enumeration using the GetConsumingEnumerable(CancellationToken canellationToken) overload.

The GetEnumerator() method calls the underlying collection’s GetEnumerator() method, so its performance depends on the implementation of the underlying collection. The ToArray() method of BlockingCollection(T) wraps calls to the corresponding methods of the underlying collection with minimal overhead, so its performance also depends on the underlying collection.

ConcurrentQueue(T)

ConcurrentQueue(T) is a data structure in .NET Framework 4 that provides thread-safe access to FIFO (First-In First-Out) ordered elements. Under the hood, ConcurrentQueue(T) is implemented using a list of small arrays and lock-free operations on the head and tail arrays, hence it is quite different than Queue(T) which is backed by an array and relies on the external use of monitors to provide synchronization. ConcurrentQueue(T) is certainly more safe and convenient than manual locking of a Queue(T) but some experimentation is required to determine the relative performance of the two schemes. In the remainder of this section, we will refer to a manually locked Queue(T) as a self-contained type called SynchronizedQueue(T).

Pure Producer-Consumer Scenario

The experiments we used for ConcurrentQueue(T) follow the producer-consumer patterns discussed earlier and we paid particular attention to the use of the simulated workload functions. The first experiment was a pure producer-consumer scenario where half of the threads were producers which simulated work by looping the simulated workload function then added an item to the queue; the other half were consumers which did the same simulation work but were instead removing items. The tests were run for various thread-counts and for differing workload sizes. We defined two workload sizes: the first is 0 FLOPS and the second is 500 FLOPS for both the producer loops and the consumer loops. These workloads are representative of workloads where contention would most likely be a dominating cost. For workloads that are significantly large, synchronization costs are potentially negligible. The exact values of elapsed times in milliseconds are not important since they vary by number of operations executed in the tests. Instead, we are interested in how elapsed time changes when a test runs on different numbers of threads, since it shows the scalability of this implementation.

clip_image002

Number of Threads

Figure 2: Comparison of scalability for ConcurrentQueue(T) and SynchronizedQueue(T) in a pure producer-consumer scenario with a zerocost workload function.

clip_image004

Number of Threads

Figure 3: Comparison of scalability for ConcurrentQueue(T) and SynchronizedQueue(T) in a pure producer-consumer scenario with a 500 FLOPS workload function.

Figures 2 and 3 show the elapsed time for a pure producer-consumer scenario implemented using ConcurrentQueue(T) and SynchronizedQueue(T) with the two different workloads.

In Figure 2, we see that when the workload was very small, both ConcurrentQueue(T) and SynchronizedQueue(T) achieved their best performance when using exactly two threads and also that ConcurrentQueue(T) performed better for the two-thread case. The lack of scalability past two threads for both queues is expected as ConcurrentQueue(T) has only two access points (the head and the tail) and SynchronizedQueue(T) has only one access point because head and tail operations are serialized. So, at most, two threads can operate with little contention but more threads will necessarily suffer contention and synchronization overheads that will dominate in the absence of a significant workload function.

For scenarios with very light computation it is best to use ConcurrentQueue(T) on two threads: one pure producer, and the other pure consumer. Queues will not scale well beyond two threads for such scenarios.

On the other hand, in Figure 3 we see that ConcurrentQueue(T) does scale beyond two threads for workloads of 500 FLOPS due to minimal synchronization overhead. SynchronizedQueue(T), however, does not scale for workloads of 500 FLOPS as its costs of synchronization are significantly higher and continue to be a significant factor. We found that 500 FLOPS is representative of the largest workload that shows a difference in scalability given the hardware our tests ran on. For larger workloads, the scalability of ConcurrentQueue(T) and Synchronized(T) do not differ greatly.

For scenarios involving moderate-size work functions (such as a few hundred FLOPS) , ConcurrentQueue(T) can provide substantially better scalability than SynchronizedQueue(T).

If you have a small workload that falls in between 0 and 500 FLOPS, experimentation will best determine whether using more than two threads is beneficial.

Mixed Producer-Consumer Scenario

The second experiment for queues used the tree traversal scenario, in which each thread was both a producer and a consumer. Figure 4 and 5 show the results for this scenario.

clip_image005

Figure 4: Comparison of scalability for ConcurrentQueue(T) and SynchronizedQueue(T) in a mixed producer-consumer scenario with a zero cost workload function.

clip_image006

Number of Threads

Figure 5: Comparison of scalability for ConcurrentQueue(T) and SynchronizedQueue(T) in a mixed producer-consumer scenario with a 500 FLOPs workload function.

From Figure 4, we see that neither data structure showed any scalability when the work function was very small.The loss of scalability for even the two-thread case was due precisely to this being a mixed producerconsumer scenario. The two threads were both performing operations on the head and the tail thus introducing contention that is not present in the pure scenario. ConcurrentQueue(T) performed worse than

SynchronizedQueue(T) as we increase the number of threads in execution. This is due to the fact that the ConcurrentQueue(T) implementation uses compare-and-swap (CAS) primitives which rely on spinning to gain entry to critical resources (see the MSDN article on Interlocked Operations:http://msdn.microsoft.com/enus/library/sbhbke0y.aspx). When contentious requests are as frequent as they are in this case, CAS primitives do not perform as well as locks, like those used in SynchronizedQueue(T).

However, as shown in Figure 5, when the work function is a few hundred FLOPS or larger, the mixed scenario showed scalability. In these situations, ConcurrentQueue(T) has lower overheads and thus shows significantly better performance that is amplified as the number of threads/cores increases.

For mixed producer-consumer scenarios, scalability is only available if the work function is a few hundred FLOPS or larger. For these scenarios, ConcurrentQueue(T) provides significantly better performance than SynchronizedQueue(T).

Other Considerations

The Count, ToArray() and GetEnumerator() members take snapshots of the head and tail and, thus, the entire collection. Taking the snapshot is a lock-free operation and takes O(1) time on average, however, each of these members have their own additional costs.

Since the queue maintains an index for each item according to the order it is added to the queue, after the Count property takes the snapshot, it simply returns the result of substracting the head index from the tail index. Thus the Count property overall is O(1) on average.

After the ToArray() method takes the snapshot, it copies all the items into an array, thus it is overall an O(N) operation. GetEnumerator() delegates to ToArray() and returns the enumerator of the result array, thus it takes O(N) time to return, and provides an unchanging snapshot of items.

ConcurrentStack(T)

ConcurrentStack(T) is an implementation of the classic LIFO (Last-In First-Out) data structure that provides threadsafe access without the need for external synchronization. ConcurrentStack(T) is intended to be used in scenarios where multiple threads are managing a set of items and wish to process items in LIFO order. It is useful in scenarios for which new data should be processed in preference to processing older data, such as a multi-threaded depth-first search. Other examples arise in situations where there are penalties for not processing data on time. In such situations, the total penalties may be minimized by processing new items first, and allowing items that have already missed their schedule to be further delayed – if so, then a LIFO data structure for managing the items may be appropriate.

We compare the ConcurrentStack(T) to a simple implementation called SynchronizedStack(T) which is a thin wrapper around the non-thread-safe Stack(T) that uses a single monitor for synchronization.

Pure Producer-Consumer Scenario

clip_image007clip_image008

ConcurrentStack workload=0

SynchronizedStack workload=0

Figure 6: Comparison of scalability for ConcurrentStack(T) and SynchronizedStack(T) in a pure producer-consumer scenario with a zero-cost workload function.

clip_image009

Number of Threads

Figure 7: Comparison of scalability for ConcurrentStack(T) and SynchronizedStack(T) in a pure producer-consumer scenario with a 500 FLOPS workload function.

Figures 6 and 7 show the results for the a pure producer-consumer scenario implemented using ConcurrentStack(T) and SynchronizedStack(T). The tests used here are identical to that used for ConcurrentQueue(T).

From the results, we see that ConcurrentStack(T) has largely identical performance characteristics to

SynchronizedStack(T). This is the result of both implementations having a single point of contention and the lack of opportunities for ConcurrentStack(T) to do anything particular to improve raw performance.

We also see in Figures 6 and 7 that a pure producer-consumer scenario involving stacks will only exhibit scaling if the workload is a few hundred FLOPS or larger. For smaller workloads, the scalability degrades until we see that no scalability is available when the workload is tiny.

For a pure producer-scenario scenario, ConcurrentStack(T) has essentially identical performance as a SynchronizedStack(T). Both show good scalability for work-functions that are a few hundred FLOPS or larger.

Even though the performance characteristics are identical for this scenario, we recommend using the ConcurrentStack(T) due to it being simple and safe to use.

Mixed Producer-Consumer Scenario

Figures 8 and 9 show the results for the tree traversal scenario implemented using ConcurrentStack(T) and SynchronizedStack(T).

clip_image010

Figure 8: Comparison of scalability for ConcurrentStack(T) and SynchronizedStack(T) in a mixed producer-consumer scenario with a zero-cost workload function.

clip_image0121

Number of Threads

Figure 9: Comparison of scalability for ConcurrentStack(T) and SynchronizedStack(T) in a mixed producer-consumer scenario with a 500 FLOPS workload function.

In the tree traversal scenario, we actually see a divergence in the performance of the two implementations, and ConcurrentStack(T) has consistently better performance due to lower overheads when used in a mixed producerconsumer scenario.

For a mixed producer-scenario scenario, ConcurrentStack(T) has better performance than SynchronizedStack(T) and shows good better for work functions of a few hundred FLOPS or larger.

Other Considerations

In some scenarios, you may have multiple items to add or remove at a time. For instance, when the LIFO order is preferred but not strictly required, a thread may be able to process N items at a time rather than processing them one-by-one. In this case, if we call Push() or TryPop() repetitively for N times, there is a synchronization cost for each operation. The PushRange() and TryPopRange() methods that ConcurrentStack(T) provide only use a single CAS operation to push or pop N items and thus significantly reduce the total synchronization cost in these scenarios.

For a scenario in which many items can be added or removed at a time to process, use PushRange() and TryPopRange() methods.

It is worth mentioning that you can also implement PushRange() and TryPopRange() for SynchronizedStack(T), by taking the global lock to push or pop an array of items. It sometimes performs better than ConcurrentStack’s PushRange() and TryPopRange(). This is because Stack(T), at the core of SynchronizedStack(T), is based on arrays, while ConcurrentStack(T) is implemented using a linked list which comes with memory allocation cost with each node. Nevertheless, we recommend to use ConcurrentStack(T) because it provides out of box API support for all scenarios, thread safety and decent overall performance.

The Count, ToArray() and GetEnumerator() members are all lock-free and begin by taking an immutable snapshot of the stack.

The Count method walks the stack to count how many items are present, and is thus an O(N) operation. Whenever possible, avoid accessing the Count property in a loop. For example, since the IsEmpty property takes only O(1) time, you should always use

while (!stack.IsEmpty)
{
//do stuff
}
instead of
while (stack.Count > 0)
{
//do stuff
}

The ToArray() and GetEnumerator() methods take a snapshot and then process the items in the snapshot, so they are both O(N) time operations.

ConcurrentBag(T)

ConcurrentBag(T) is a new type for .NET Framework 4 that doesn’t have a direct counterpart in previous versions of .NET Framework. Items can be added and removed from a ConcurrentBag(T) as with ConcurrentQueue(T) and ConcurrentStack(T) or any other IProducerConsumerCollection types, but the items are not maintained in a specific order. This lack of ordering is acceptable in situations where the only requirement is that all data produced is eventually consumed. Any scenario that can use a bag could alternatively use an ordered data structure such as a stack or a queue but the ordering rules demand restrictions and synchronization that can hamper scalability.

ConcurrentBag(T) is built on top of the new System.Threading.ThreadLocal(T) type such that each thread accessing the ConcurrentBag(T) has a private thread-local list of items. As a result, adding and taking items can often be performed locally by a thread, with very little synchronization overhead. However, a ConcurrentBag(T) must present a global view of all the data so, if a thread tries to take an item but finds its local list is empty, it will steal an item from another thread if other threads own items. Since ConcurrentBag(T) has very low overheads when each thread both adds and removes items, we can immediately see that the ConcurrentBag(T) should be an excellent collection type for mixed producer-consumer scenarios if ordering isn’t a concern.

The graph traversal scenario is thus an ideal scenario for the ConcurrentBag(T) if the specific traversal ordering is not important. When the tree is balanced, there is a high probability that a thread that produces a node will also consume that node, so a signifcant percentage of TryTake() operations will be inexpensive removal operations from the thread’s lock list, as opposed to costly steal operations from other thread lists. The process starts by adding the root node to the ConcurrentBag(T) on the main thread then spinning up producer-consumer threads. One of the threads will steal the root node and produce child nodes to search. From here, the other threads will race to steal nodes and then commence searching in their own sub-trees. Once the process warms up, the threads should largely operate in isolation with little need to synchronize until we start to run out of nodes to traverse.

clip_image014

Figure 10: Visualization of graph-traversal using multiple-threads and a ConcurrentBag(T).

Figure 10 shows how a graph is traversed using a ConcurrentBag(T) and four threads. The black arrows show the nodes stolen by the threads and the node colors represent the thread used to traverse them.

The primary scenario we used to test ConcurrentBag(T) is an unordered graph-traversal where the work functions are string comparison (with a maximum of 10 characters in each string). The inner loop contains the following:

if (bag.TryTake(out node))
{
for (int i = 0; i < node.Children.Count; i++)
{
bag.Add(node.Children[i]);
}
ProcessNode(node); //e.g. a short string comparison }

For comparison types, the main contenders are thread-safe ordered collections such as thread-safe queue, and thread-safe stack. We could use simple implementations of the thread-safe ordered collections, but we chose to test against the new collections: ConcurrentQueue(T) and ConcurrentStack(T). Figure 11 shows the result of the tree-search scenario for different tree sizes.

clip_image015

Number of Nodes in Tree

Figure 11: Comparison of ConcurrentBag(T), ConcurrentStack(T), and ConcurrentQueue(T)’s performance in a mixed producer-consumer scenario for various tree-sizes.

As expected, ConcurrentBag(T) dramatically outperformed other collections for this scenario and we can expect the results to generalize to other mixed producer-consumer scenarios.

For mixed producer-consumer scenarios that do not require item ordering, ConcurrentBag(T) can be dramatically more efficient than ConcurrentStack(T) , ConcurrentQueue(T), and other synchronized collections.

To measure the scalability of ConcurrentBag(T), we ran the same scenario for a tree size of 100,000 nodes and varied only the number of threads involved in the search.

clip_image016

Figure 12: Scalability of ConcurrentBag(T) in a mixed producer-consumer scenario.

Figure 12 demonstrates that the scalability of ConcurrentBag(T) is excellent even when the work functions are very small. As noted previously, for scenarios that involve larger work functions, we can expect the scalability to be even closer to linear.

ConcurrentBag(T) shows excellent scalability for mixed producer-consumer scenarios

ConcurrentBag(T) may not be appropriate for pure producer-consumer scenarios.

Although ConcurrentBag(T) has excellent performance for a mixed producer-consumer scenario, it will not have the same behavior for pure producer-consumer scenarios as all the consumers will have to repeatedly perform stealing operations and this will incur significant overheads and synchronization costs.

Other Considerations

ConcurrentBag(T) is a bit heavyweight from the memory prespective, the reason being that ConcurrentBag(T) is not disposable but it internally consists of disposable ThreadLocal(T) objects. These ThreadLocal(T) objects, even when they are not used anymore, cannot be disposed until the ConcurrentBag(T) object is collected by GC. IsEmpty, Count, ToArray() and GetEnumerator() lock the entire data structure so that they can provide a snapshot view of the whole bag. As such, these operations are inherently expensive and they cause concurrent Add() and Take() operations to block. Note that by the time ToArray() or GetEnumerator() returns, the global lock is already released and so the original collection may have already been modified.

 

ConcurrentDictionary(TKey,TValue)

The ConcurrentDictionary(TKey,TValue) type provides a thread-safe implementation of a strongly-typed dictionary. Prior to .NET Framework 4, the simple way to achieve thread-safe access to a strongly-typed dictionary structure was to use a lock to protect all accesses to a regular Dictionary(TKey,TValue). When using a locked Dictionary, the dictionary object itself can be used as the lock, so to safely update an element we would typically use the following:

void UpdateElement(Dictionary<int, int> dict, int key, int newValue)
{
lock (dict)
{
dict[key] = updatedValue;
}
}

When reading from this data structure, we must also take the lock as concurrent updaters may be making structural changes that make searching in the data structure impossible. Hence:

void GetElement(Dictionary<int, int> dict, int key)
{
lock (dict)
{
return dict[key];
} }

Using a common lock effectively serializes all accesses to the data structure even if the bulk of the accesses are simple reads.

The ConcurrentDictionary(TKey,TValue) type provides a thread-safe dictionary which does not rely on a common lock. Rather, ConcurrentDictionary(TKey, TValue) internally manages a set of locks to provide safe concurrent updates and uses a lock-free algorithm to permit reads that do not take any locks at all.

The ConcurrentDictionary(TKey,TValue) is applicable to any problem involving concurrent access to a dictionary where updates are possible. However, for read-only access to a dictionary such as a lookup table with fixed data, a simple Dictionary(TKey,Value) has lower overheads than ConcurrentDictionary(TKey,TValue).

If you require only concurrent reads with no updates, a regular Dictionary(TKey,TValue) or a ReadOnlyDictionary(TKey,TValue) is appropriate, even in multi-threaded scenarios.

It should also be noted that the Hashtable datastructure from .NET Framework 1.1 is intrinsically thread-safe for multiple non-enumerating readers and a single writer, but not safe for that multiple writers or enumerating readers. For certain situations, Hashtable is a reasonable baseline for comparison to ConcurrentDictionary(T), but we do not consider this further due to the more complex rules and because Hashtable is not intrisically a stronglytyped datastructure.

In the following sections, we will examine the performance of a ConcurrentDictionary(int,int) type in scenarios that involve various combinations of concurrent reads and writes. A dictionary is not typically used for producerconsumer scenarios but rather in applications with lookup-tables and caches or when constructing groups of data with identical keys. As such, we will consider a different set of scenarios than those used previously.

An (Almost) Read-only Dictionary

Some problems call for a thread-safe dictionary that is mostly read-only but is occasionally updated. For example, consider a mapping of NetworkID to Status which changes when interruptions or repairs occur:

CentralNet  OK  ,EternalNet  FAULT  LocalDeviceNet  OK

Because of the occasional updates, a multi-threaded application may only access this dictionary via completely thread-safe operations. The ConcurrentDictionary(TKey, TValue) has an opportunity to scale well due to its lockfree reads.

 

Number of Threads

Figure 13: Comparison of scalability for ConcurrentDictionary(TKey,TValue) and SynchronizedStack(TKey,TValue) in a read-heavy producerconsumer scenario.

Figure 13 shows scalability for a scenario where a total of M reads were made in the absence of additions, deletions, or in-place updates. We can see that the ConcurrentDictionary(TKey, TValue) can service multiple threads performing concurrent reads and still scale well. On the other hand, the performance of a locked Dictionary(TKey, TValue) degrades as more threads participate as contention on the shared lock becomes an increasingly great cost.

For read-heavy scenarios that require a thread-safe dictionary, the ConcurrentDictionary(TKey,TValue) is the best choice.

Frequent Updates

A variety of scenarios involve frequent adding and updating of values in a shared dictionary structure. For example, a dictionary might be used to accumulate item counts as data is extracted from a source. A simple thread-safe approach using a locked Dictionary(Int32,Int32) is:

void ExtractDataAndUpdateCount(Input input, Dictionary<int, int> dict)
{
int data;
ExtractDataItem(input, dict, out data); lock (dict)
{
if (!dict.ContainsKey(data)) dict[data] = 1; else
dict[data]++;
} }

If we look to use a ConcurrentDictionary(Int32,Int32), we need an approach that provides atomic updates without taking a common lock. One approach is to use a CAS loop that repeatedly reads an element and calls Try Add() if the element does not exist or TryUpdate() until it successfully updates without experiencing contention. Fortunately, ConcurrentDictionary(TKey,TValue) provides an AddOrUpdate() method which takes care of the details of performing an atomic update. AddOrUpdate() takes delegate parameters so that it can re-evaluate the updated value whenever write contention occurs. The corresponding code for ConcurrentDictionary(Int32,Int32) is thus:

void ExtractDataAndUpdateCount(Input input, ConcurrentDictionary<int, int> cd) {

int data;

ExtractDataItem(input, dict, out data); cd.AddOrUpdate(data, k => 1, (k, v) => v + 1); }

clip_image020

Number of Threads

Figure 14: Comparison of scalability for ConcurrentDictionary(TKey,TValue) and SynchronizedStack(TKey,TValue) in an update-heavy producer-consumer scenario.

Figure 14 shows a comparison of performance for continuous atomic update operations. We assume that the ExtractDataItem() method is essentially free, and that there are many different key values so that we are not always updating the exact same items.

For frequent updates, LockedDictionary(TKey,TValue) shows a performance profile that is very similar to its read profile, which is expected given that both situations require taking a common lock and thus serialize all work.

The ConcurrentDictionary(TKey, TValue) data indicates that for sequential scenarios (nCores=1), the

AddOrUpdate() operations are on the order of 0.5x the speed of a simple locked Dictionary(TKey, TValue).

However, as the number of participating cores increases, the use of multiple internal locks within

ConcurrentDictionary(TKey, TValue) permits some level of scalability. In particular, the update performance increases by up to a factor of 2 but it is limited by contention on the shared locks and cache-invalidation costs.

  In a multi-threaded scenario that requires frequent updates to a shared dictionary, ConcurrentDictionary(TKey,TValue) can provide modest benefits.

Since the scalability for frequent writes is not ideal, always looks for opportunities to rework an update-heavy algorithm such that the workers can accumulate data independently with a merge operation to combine them at the end of processing.

A scenario that entails all writes to a shared dictionary is the worst-case performance scenario for

ConcurrentDictionary(TKey,TValue). A more realistic scenario may involve some significant time spent in the ExtractDataItem() method or other per-item processing. As more time is spent on local per-thread computation, scalability will naturally increase as contention on the dictionary will cease to be the primary cost. This applies equally to both LockedDictionary(TKey,TValue) and ConcurrentDictionary(TKey, TValue).

Concurrent Reading and Updating

We can also consider situations where some amount of reading and writing takes place. Recall that a simple locked Dictionary(TKey,TValue) must take locks for both reads and writes but that a

ConcurrentDictionary(TKey,TValue) only requires locks for writes and that it manages multiple internal locks to provide some level of write-scalability.

The ConcurrentDictionary(TKey,TValue) is clearly the best choice whenever the level of concurrency is high, as it has better performance for both reads and writes. However, even in a dual-core scenario, we may find that ConcurrentDictionary(TKey,TValue) is the best choice if there is a significant proportion of reads.

clip_image001[4]

% Writes

Figure 15: Comparison of performance for ConcurrentDictionary(TKey,TValue) and SynchronizedStack(TKey,TValue) for various read/update ratios.

If your scenario includes a significant proportion of reads to writes, ConcurrentDictionary(TKey, TValue) offers performance gains for any number of cores/threads.

Other Considerations

ConcurrentDictionary(TKey,TValue)’s Count, Keys, Values, and ToArray() completely lock the data structure in order to provide an accurate snapshot. This serializes all of these calls and interferes with add and update performance. Hence, these methods and properties should be used sparingly.

The GetEnumerator() method provides an enumerator that can walk the Key/Value pairs that are stored in the dictionary. GetEnumerator() doesn’t take any locks but it nonetheless guarantees that the enumerator is safe for use even in the face of concurrent updates. This is great for performance but, because no snapshot is taken, the enumerator may provide data that is a mixture of items present when GetEnumerator() was called and some or all of the subsequent updates that may have been made. If you require a an enumerable snapshot of the ConcurrentDictionary(TKey, TValue), either arrange for all updates to pause before enumeration or use the ToArray() method to capture the data whilst all the internal locks are held.

ConcurrentDictionary(TKey,TValue) update operations are internally protected by fine-grain locks whose granularity can be tuned by specifying the concurrency level in the constructor:

public ConcurrentDictionary(int concurrencyLevel, int capacity)

If a concurrencyLevel is not specified, the default is four times the number of processors. Increasing the concurrency level increases the granularity of the locks which generally decreases contention for concurrent updates. As a result, specifying a concurrencyLevel higher than the default may improve performance for frequentupdate scenarios. The downside is that all the operations that lock the whole dictionary may become significantly more expensive.

The default concurrency level is appropriate for most cases. For scenarios that include significant concurrent updates, you may wish to experiment with the concurreny level to acheive maximum performance.

 

Details of the Experimental Framework

All tests were run on identical 8-core machines with the following hardware configurations:

• Intel ® Xeon® CPU E5345 @2.33GHz, 2 sockets x 4-core, 8GB RAM

• .NET Framework 4 • All tests were run on both Windows Win7 Ultimate 32-bit and  64-bit

Each test was executed multiple times in a environment that reduced unnecessary external noise and we computed the mean elapsed time as the standard measure. Our results had standard deviations of less than 10% of the mean. We used several approaches to reduce noise in order to obtain stable results:

• We stopped all unnecessary background services and applications and turned off network access and peripherals where possible.

• We selected test parameters such that the each individual test iteration took at least 100 milliseconds. This reduced most noise.

• To reduce the impact of garbage collection, the tests kept memory use to a minimum by using simple types and small data volumes, and we forced garbage collections between runs.

• All timing measurements excluded the time required to initialize the application, warm up the .NET Framework ThreadPool, and perform other house-keeping operations.

Due to the nature of our tests, they do not represent entire applications. Thus the data we presented should be considered only indicative of the performance of similar units of a program on similar hardware, and not entire systems or applications.

Finally, we note that experiments run on 32-bit and a 64-bit platforms may show significant variance in both speedup and scalability. There are many factors that can be the cause this variance and, in some cases, the differences favor one architecture over the other. If maximum performance is crucial and an application can run on either a 32-bit or a 64-bit platform then specific performance measurements are required to select between the two alternatives.

Unit testing with .Net thoughts (and what happened next) with Private Accessors

Few thoughts about the refactoring process from going through my initial unit testing iteration.

  • Constructor Injection can make things look a lot more testable. But I think you want to be careful about jumping into this – don’t do it as the first step in refactoring. It’s probably better to first extracting bits of code that don’t have strong dependencies – and don’t need to know about a class state. Creating some beefy static methods that do a lot of work if you give them a few parameters, even if you could have gotten them from ‘this if you did a non-static method, can just make life easier unit testing.
  • Once you’ve extracted some dependency-lite code, what you should have left over is some dependency-heavier code where DI is a better, natural fit.
  • Sometimes what you are injecting is not a ‘runtime dependency’ it’s just a piece of policy or logic or configuration that doesn’t feel like something you want to hardcode in this particular class, or that not hardcoding can simplify the testing of. Think of a mapping of types to ID numbers or something that feels so arbitrary. Unfortunately DI isn’t a natural fit for this because DI works in terms of types.
  • Separating enumeration/grouping control flow logic from actual data processing can make code look really nice, where classes/methods have clearer responsibilities, and make tests simpler too. It can also lead to having more classes.

None of that this refactoring should change your block count much. And it takes time, and then you still have to write the unit tests, so it feels frustrating.

After my initial bout of refactoring I realized this was very slow going, and started looking for cheaper ways to get coverage up. Oh wait, I already have a folder of known dead code? Delete! That feels good, but wasn’t as big a % boost as I was hoping though.

After going down this road a bit, you get to a point where you think – I won’t need dedicated complicated factory classes if I switch over to using a Dependency Injection framework. This brings up another worry though. While you no longer need code coverage of a factory class, do you need to replace it with code coverage of your dependency container setup? Is there really no getting away from stupid looking tests?!

When I switch over to DI, I often see a couple nice code simplifications fall out of it. But, still coverage gets slightly worse overall by the refactoring – however, it’s now crystal clear that 55 of my 599 uncovered blocks are dependency injection container setup/binding/convenience/wrapping. For some reason that feels almost as good as actually covering that code. I think the reason is that I know this code runs every single time I run my app, so it’s going to get a lot of real-world coverage as opposed to unit test coverage. And I can feel that confidence in a way that I just don’t feel for an arbitrary factory class that is floating around.

By this point the biggest class I have with zero coverage is my DatabaseContext, which has a whole pile of helper methods on it. This brings me to the thorny question of how to unit test EF database queries… which is probably going to be an article in itself, so I’ll come back to that another day.

Aside from that there are two largish classes with lots of logic I can try to figure out how to test, that now have much nicer factored dependencies.

While I’m not sure if it’s a valid topic for discussion I’ve also noticed a couple random small things you can do to reduce your overall # of code blocks along the way.

  • Explicit default constructors increase your block count.
  • Async/await increases your block count. And rarely leads to subtle bugs. Why use it if you don’t actually need it?
  • Assertion helper classes and other ‘correctness helpers’ can have logic inside them which does annoying things to your coverage numbers if part of the project you are measuring – unless you either exclude them from coverage, which you can do with ExcludeFromCodeCoverageAttribute, or move them out to another project.

The idea of creating either a .ArgumentValidation library or a .Comon library really doesn’t sound good.
Everyone knows this pattern of saying Require.NotNull(foo, “foo”), or Argument.Required(foo, “foo”) right? Which means you only have a single line of code [to test code coverage of] instead of a zillion if statements, in each of your methods?
But just how many times does this particular wheel need to be reinvented?

Well it turns out, you might finally be able to stop and never write that class again. Someone made a MIT-licensed open-sourced-on-github nuget package for that.
http://www.nuget.org/packages/Argument/

Why Need to Write?

A good answer has Stefan in his blog original entry pointed out .

Shortly : from Visual Studio 2012, private accessors cannot be created any more by the IDE.

It seems that this does not bother too many people. The ‘Bring Back Private Accessor’ suggestion on UserVoice has not found any supporters right now.

Nevertheless, I still need access to private members of classes for testing purposes. And I do not want to declare them as internal because these members should not be (mis)used from within the same assembly.

In the MSDN forums post How to create private accessors in VS 2012 that one can use the command line tool publicize.exe to create private accessors too.

Why :

  • Type safety
  • Protect from typos
  • Easy to handle when rename is required
  • Encapsulate / hide the use of PrivateObject

Use of PrivateObject

Microsoft.VisualStudio.TestTools.UnitTesting.PrivateObject is the base class to build up our own private accessor infrastructure. Well, infrastructure is such a big word. As you will see later, this infrastructure is really lightweight.

The direct use of PrivateObject has some disadvantages. You always have to use strings as parameters to invoke a method or get / set the value of a property. So there is a good chance the call of a method fails because I mistyped the method name. Some casting issues come up too. And using async / await in .NET 4.5, the direct use ofPrivateObject.Invoke does not look really funny.

The direct use of PrivateObject looks like this

// Create an instance of the class to be tested.
ClassToBeTested instance = new ClassToBeTested ();

// Create a PrivateObject to access private methods of instance.
PrivateObject privateObject = new PrivateObject(instance);

// Create parameter values
object[] args = new object[] { "SomeValue" };

// Call the method to be tested via reflection, 
// and pass the parameter, and cast the result
int result = (int) privateObject.Invoke("PrivateMethod", args);

// Do some validation stuff

Create an Accessor Class

The first step to get away from this unpleasant and code inflating use of PrivateObject is to create an accessor class inside the unit test assembly.

This accessor class will hide the usage of PrivateObject. It gives type safety and helps to avoid typos on method- and property names. In case a member of the tested class is renamed, there is one single place to change in the test assembly. All other changes in the unit test can be done by Visual Studio renaming support.

A short version of the accessor to ClassToBeTested might look like this:

internal ClassToBeTestedAccessor
{
  private PrivateObject PrivateObject { get; set; }

  internal ClassToBeTestedAccessor()
  {
    PrivateObject = new PrivateObject(new ClassToBeTested());
  }

  internal int PrivateMethod
    (
    string parameter
    )
  {
    return ((int) PrivateObject
      .Invoke("PrivateMethod", new object[] { parameter }));
  }
}

And the test method using the accessor changes to this:

// Create an instance of the accessor.
ClassToBeTestedAccessor accessor = new ClassToBeTestedAccessor ();

// Call the method to be tested
int result = accessor.PrivateMethod("SomeValue");

// Do some validation stuff

Now the test method itself already looks the way I like it.

Make Accessor Reusable

Let’s assume there is the need for more than one accessor class in the test assembly. Then it might make sense to put some functionality into a base class for reuse purposes.

The base class should keep the instance of the tested class and the PrivateObject to access it. This means it have to be generic to get a type safe implementation.

This will be the starting point:

internal abstract class PrivateAccessorBase<T>
{
  private PrivateObject PrivateObject { get; set; }

  public T TestInstance 
    { get { return ((T)PrivateObject.Target); } }

  internal PrivateAccessorBase
      (
      T instance
      )
    {
      PrivateObject = new PrivateObject(instance);
    }
}

Please notice the public property TestInstance. This proporty gives access to the instance that is encapsulated by the private accessor. In case the unit test method needs to access public members of the tested class too, use this proporty.

What is common in my unit tests is to get or set the value of a property and to call a method. Therefore, the base class should offer the ability to do that.

Property Access

To get a property value, one might implement a method like this:

protected TReturnValue GetPropertyValue<TReturnValue>
  (
    string propertyName
  )
{
  return ((TReturnValue)PrivateObject
    .GetProperty(propertyName, null));
}

But this implementation still requires to pass the name of the property as a string. This can be avoided by using the call stack (in case you are using .NET 4.5 or higher, please have a look below to avoid StackFrame usage):

protected TReturnValue GetPropertyValue<TReturnValue>()
{
  // Need to remove the "get_" prefix from the caller's name
  string propertyName 
    = new StackFrame(1, true).GetMethod().Name
      .Replace("get_", String.Empty);

  return ((TReturnValue)PrivateObject
    .GetProperty(propertyName, null));
}

Setting a property value looks quite similar:

protected void SetPropertyValue
  (
  object value
  )
{
  // Need to remove the "set_" prefix from the caller's name
  string propertyName 
    = new StackFrame(1, true).GetMethod().Name
      .Replace("set_", String.Empty);

  PrivateObject.SetProperty(propertyName, value, new object[] { });
}

Having this base class, we can implement the ClassToBeTestedAccessor like this:

internal ClassToBeTestedAccessor
  : PrivateAccessorBase<ClassToBeTested>
{
  internal ClassToBeTestedAccessor ()
    : base(new ClassToBeTested())
  {
  }

  internal int PrivateProperty
  {
    get { return (GetPropertyValue<int>()); }
    set { SetPropertyValue(value); }
  }
}

The usage of the property by the test class does not differ from the usage of ‘normal’ properties:

// Create an instance of the accessor.
ClassToBeTestedAccessor accessor = new ClassToBeTestedAccessor ();

// Set the value
accessor.PrivateProperty = 231;

Debug.WriteLine("PrivateProperty: >{0}<", accessor.PrivateProperty);
Method Access

Based on this, it is simple to implement the access of non-public methods in the accessor base class.

// Implemented by PrivateAccessorBase
protected TReturnValue Invoke<TReturnValue>
  (
   params object[] parameter
  )
{
  return ((TReturnValue)PrivateObject.Invoke(
    new StackFrame(1, true).GetMethod().Name, parameter));
}

The accessor class uses this method like this:

// Usage by ClassToBeTestedAccessor
internal int PrivateMethod
  (
  string parameter
  )
{
  return (Invoke(parameter));
}

Using the PrivateAccessorBase together with a derived accessor class, the typo danger is eliminated. We got type safety, the accessor implementation does not containPrivateObject usage, and there is a single place to be changed when some property or method names of the tested class will change.

Preconditions

There is only one precondition: The properties and methods names of the accessor class have to match 100 percent with the corresponding members of the tested class!

Avoid Inlining

One thing really important when using the StackFrame is to make sure the caller’s name will not change. How might this happen? When the JIT compiler thinks it should optimize the code by inlining methods.

This would be fatal when using the call stack to get the name of the original caller. As soon as a method is inlined, the call stack changes.

Sample: Assume ClassToBeTestedAccessor.PrivateMethod will be optimized, because it only contains the call to PrivateAccessorBase.Invoke. In this case, the name of the caller in PrivateAccessorBase.Invoke won’t be ‘PrivateMethod’ any more, but the name of the method that called ClassToBeTestedAccessor.PrivateMethod. Using this name, let’s assume it is ‘TestPrivateMethod’, as the method name parameter ofPrivateObject.Invoke, the test will fail with an exception, telling the method ‘TestPrivateMethod’ was not found.

Even if unit tests normally run in debug mode (the ‘Optimize code’ option is not set in the project’s build configuration), one should be aware of this. To make sure the JIT compiler does not inline a method, set the [MethodImpl(MethodImplOptions.NoInlining)]attribute. The sample solution shows how and where.

Changes in .NET 4.5

In case you can use .NET 4.5 or above, you don’t have to care about inline optimization :-)

.NET 4.5 introduced the class CallerMemberNameAttribute. Using this class, there is no need to worry about these things. The attribute is processed by the C# compiler when it produces the IL code. Optimization is done by the JIT compiler later on (see forum post CallerMemberNameAttribute and Inlining).

For the .NET 4.5 implementation, the usage of MethodImpl and StackFrame is obsolete. The PrivateAccessorBase changes to this:

// Implemented by PrivateAccessorBase
protected TReturnValue GetPropertyValue<TReturnValue>
  (
  [CallerMemberName] string callerName = null
  )
{
  return ((TReturnValue)PrivateObject.GetProperty(callerName, null));
}

protected void SetPropertyValue
  (
  object value,
  [CallerMemberName] string callerName = null
  )
{
  PrivateObject.SetProperty(callerName, value, new object[] { });
}

protected TReturnValue Invoke<TReturnValue>
  (
  [CallerMemberName] string callerName = null,
  params object[] parameter
  )
{
  return ((TReturnValue)PrivateObject.Invoke(callerName, parameter));
}

Please notice that even the removal of the ‘get_’ / ‘set_’ prefix is not required any more.

The ClassToBeTestedAccessor has to name the parameter passed toPrivateAccessorBase.Invoke to make sure the first parameter is not interpreted as the caller method name.

// Implemented by ClassToBeTestedAccessor
internal string ProtectedMethod
  (
  string parameterValue
  )
{
  // Need to name the parameter, 
  // otherwise the first parameter will be 
  // interpreted as caller's name.
  return (Invoke(parameter: parameterValue));
}

Test Static Member

To access static class members, no class instance or PrivateObject is required. ThePrivateAccessorBase collects all properties and methods the type defines, using the static constrcutor.

// Implemented by PrivateAccessorBase
private static IEnumerable DeclaredProperties { get; set; }

private static IEnumerable DeclaredMethods { get; set; }
…
static PrivateAccessorBase()
{
  TypeInfo typeInfo = typeof(T).GetTypeInfo();

  PrivateAccessorBase<T>.DeclaredProperties 
    = typeInfo.DeclaredProperties;

  PrivateAccessorBase<T>.DeclaredMethods 
    = typeInfo.DeclaredMethods;
}

One have to search for the property / method having the matching name, and invoke it.

// Implemented by PrivateAccessorBase
protected static TReturnValue GetStaticPropertyValue<TReturnValue>
  (
  [CallerMemberName] string callerName = null
  )
{
  return ((TReturnValue)PrivateAccessorBase<T>
    .DeclaredProperties
      .Single(info => info.Name.Equals(callerName))
        .GetValue(null));
}

protected static void SetStaticPropertyValue
  (
  object value,
  [CallerMemberName] string callerName = null
  )
{
  PrivateAccessorBase<T>.DeclaredProperties
    .Single(info => info.Name.Equals(callerName))
      .SetValue(null, value);
}

protected static void InvokeStatic
  (
  [CallerMemberName] string callerName = null,
  params object[] parameter
  )
{
  PrivateAccessorBase<T>.DeclaredMethods
    .Single(info => info.Name.Equals(callerName))
      .Invoke(null, parameter);
}

Please note that this is the .NET 4.5 implementation. For .NET version < 4.5, please refer to the sample solution.

Test Async Methods

Having the above in place,  it is easy to extend the base class to be ready for calling async methods.

// Implemented by PrivateAccessorBase
protected async Task<TReturnValue> InvokeAsync<TReturnValue>
  (
  [CallerMemberName] string callerName = null,
  params object[] parameter
  )
{
  TReturnValue returnValue 
    = await (Task<TReturnValue>)PrivateObject
      .Invoke(callerName, parameter);

  return (returnValue);
}

The derived class uses it like this:

// Implemented by ClassToBeTestedAccessor
internal async Task<double> PrivateMethodWithReturnValueAsync
  (
  int millisecondsDelay,
  string outputText
  )
{
  double returnValue 
    = await InvokeAsync<double>
      (parameter: new object[] { millisecondsDelay, outputText });

  return (returnValue);
}

And the testing class uses the accessor this way:

// Implemented by ClassToBeTestedTest
[TestMethod]
public async Task TestPrivateMethodWithReturnValueAsync()
{
  int millisecondsDelay = 300;
  string testValue = "method unit test";

  ClassToBeTestedAccessor accessor 
    = new ClassToBeTestedAccessor();

  double result 
    = await accessor
      .PrivateMethodWithReturnValueAsync(millisecondsDelay, testValue);

  // Do validation stuff
}

Not Complete

The intention of this post and the sample solution is to give you an idea on how to create a home-made private accessor, and what to take care of when doing so.

The intention is not to offer a complete substitute of Visual Studio’s Private Accessor Creation feature.

Accordingly, not all possible unit test scenarios are covered. Please feel free to extend the code for your needs and share your solutions and ideas with us by adding comments describing it.

Conclusion

Having this accessor base class, the creation of private accessors should be quite easy. Of course, not as easy as clicking a menu item, but better than fiddling around with PrivateObject.

The code snippets in this post might look a little bit complex and / or confusing. In this case, try to focus on the test class and accessor. Forget about the PrivateAccessorBase.

I hope you agree that both the creation of the accessor and the usage by the test class will be very comfortable and straightforward, having PrivateAccessorBase in place.

Visual Studio 2013 Update 2 – TFS changes

Like Brian Harry announced it is already an CTP of update 2 with following Changes for TFS


Agile Planning

    Team Foundation Server
    • The portfolio backlogs have performance improvements during web access navigation.
    • You can query on tags in Visual Studio and through web access.
    • You can apply tags to work items in Visual Studio.
    • You can apply permissions for who can add new tags.
    • REST API is available for work item tracking tagging.
    • You can edit tags in the Excel add-in for Team Foundation Server.
    • You can configure non-working days, and these are excluded from burndown charts.
    • Cumulative Flow Diagram start dates are configurable.
    • Lightweight charts can be pinned to project or team homepages.
    • You can customize the colors in lightweight charts.
    • The look and feel of the project and team homepage is updated.

Testing Tools

  • This update provides to testers and test leads the ability to export test artifacts so that these can be sent by using email or as printouts and shared with stakeholders who do not have access to TFS.

Release Management

  • The tags are designed to perform the same operation across the servers. If there are server specific actions, the user can always add the specific server and the corresponding actions at that level in the deployment sequence.
  • To configure a group of server by using same tag implies that you can set values for the whole group and all the servers in the group therefore share common values for all variables.
  • You can easily deploy to identical or clustered servers without having to repeat the deployment sequence on each server.
  • You can Copy Tags across stages and across Templates. You can retain the same deployment sequence with all the tags and servers when copied to other stages or Release templates under the same environment.

FAQ Enterprise Library 6.0 & Unity 3.0

Microsoft Enterprise Library is a collection of reusable software components designed to assist developers with common enterprise development challenges, it contains standalone application blocks that can be combined to work together to solve various cross-cutting concerns such as DI, logging, error handling, data access, validating input, etcetera. The new version of Enterprise Library has been released, Enterprise Library 6.0. Enterprise Library 6.0 has enhancements for more recent technologies and integrates with ASP.NET MVC and ASP.NET Web API.

In Enterprise Library 6.0, the following application blocks are available:

  • Data Access Application Block
  • Exception Handling Application Block
  • Logging Application Block
  • Policy Injection Application Block
  • Semantic Logging Application Block
  • Transient Fault Handling Application Block
  • Unity Application Block
  • Validation Application Block

What’s New in Enterprise Library 6.0

  • Now all Enterprise Library 6.0 blocks have been updated to .NET 4.5 Framework, this is the global change in this version.
  • Working with Group Policy for configuration support has been removed from this version.
  • Block instrumentation is no longer available in this release.
  • In this version only the Policy Injection Application Block can take dependency for the exceptions.
    Newly implemented Blocks
    • Semantic Logging Application Block
    • Transient Fault Handling Application Block

    Rejected Blocks that have been removed

    • Caching Application Block
    • Cryptography Application Block
    • Security Application Block

    Other but Important changes

    • Now if you are working with SQL Server scripts, they can support SQL Server 2005 or later. As previously not supported.
    • There is no more need to use a DI container while working with an Exception Handling Application Block, object creation is much easier, you can directly create an object from configuration.
    • The Rolling Behavior of a Logging Application Block has changed, now the trace listener rolls the log every day at midnight, not depending on the last message written as previously done.
    • JSON formatter has been included in the Logging Application Block.
    • Now you can write messages asynchronously by using AsynchronousTraceListenerWrapper with your existing listeners.
    • Complete support for Windows Azure caching, storage, transient fault, etcetera by changing the Transient Fault Handling Application Block and make it an Integration Pack of the Enterprise Library for Windows Azure.
    • Several bugs, error checking and internal implementation has been changed, added and removed in many blocks, listeners and features. Like: Email Event Listener, MSMQ Event Listener, Rolling Flat File Event Listener, Policy Injection Application Block, Semantic Logging Application Block, Validation Application Block, etc.

Unity

Unity is also an application block of Enterprise Library that provides a lightweight, extensible dependency injection container with support for constructor, property, and method call injection. It facilitates building loosely coupled applications. And the way we envision Unity 3.0 is that they will be targeted for Windows 8. Microsoft patterns & practices lists the new features of Unity 3.0

What’s New in Unity 3.0

  • The Unity assembly is now Security Transparent.
  • Unity now supports NetCore (Windows Store apps).
  • The bug that caused a first chance exception when registering a singleton is fixed.
  • The bug that resulted in static properties not being filtered out when doing property injection is fixed.
  • Internally, the WinRT namespaces that support Unity in Windows Store apps have been renamed to NetCore.
  • Unity now supports resolving objects of type Lazy<T>
  • Unity now supports registration by convention through the new RegisterTypes method.
  • Unity now includes support for ASP.NET MVC and ASP.NET Web API.

As Semantic Logging Application Block and Transient Fault Handling Application Block are the newly implemented blocks, let’s learn something more about the Semantic Logging Application Block.

Semantic Logging Application Block

Semantic Logging Application Block provides a set of destinations (sinks) to persist application events published using a subclass of the EventSource class from the “System.Diagnostics.Tracing” namespace. It enables developers to have a structured way to log events, so that later analysis of the logged information can produce valuable inputs to the business. You can do many of the same things with the Semantic Logging Application Block; you can write log messages to multiple destinations, you can control the format of your log messages, and you can filter what gets written to the log.

To store some error using semantic logging application block we use:
MyEventSource.Log.UIError(ex.Message);

When you write log messages, what exactly does semantic logging mean? Semantic logging means that before writing a message you need to define what log messages you will write. Now how do you define and write messages? The answer is the EventSource class. The “System.Diagnostics.Tracing” namespace of the .NET 4.5 Framework provides the EventSource class. See the syntex.
public class MyEventEventSource : EventSource
{
public class Points
    {
//code
    }
public class Tasks
    {
//code
    }
    [Event(define.......)]
internal void Task(string message)
    {
//code
    }
    public static readonly MyEventEventSource Log = new MyEventEventSource();
}

The class, MyEventEventSource that extends the EventSource class, contains definitions for all of the events that you want to be able to log using Event Tracing for Windows (ETW).
Application using the Semantic Logging Application Block
image1.jpg

Dumitru baby V2.0 has shipped RTM

elias I’m excited to announce that after a long nine-month development period, V2.0 has been released.  Features include 10 fingers and 10 toes, a cute little nose, and a malfunctioning sleep mode.  He inputs milk and outputs plentifully.

Here are the specs:

<baby>
  <name>Elias Matei Dumitru</name>
  <gender>male</gender>
  <weight kg="2,9" oz="5" />
  <length cm="43" />
</baby>

Software Development Processes FAQ

 

For a while I  talk about a very general overview of many of the popular software development processes being used in the industry.  Obviously this is way too much information to cover in a single talk.  The point of the talk was certainly not to make everyone in the audience an expert on all the subject matter. Was just to quickly list as naming words that I could think of, briefly define each, and describe the problems each process can help solve.

Following the talk I released a large list of additional resources.  The real goal of the talk was to point people in the right direction so that they could go forth learn more about any software development process they felt might be beneficial to them or their team.  Here is that list.

Naming Summary

  • Software Engineering, Agile, Lean, DevOps
  • Waterfall, Extreme Programming (XP), Scrum, Kanban
  • System Metaphor, User Story, “As a”, System Metaphor, Epic, Cynefin
  • Backlog Management, Grooming, Story Map
  • Sprint, Planning Meeting, Planning Game/Poker, Estimate verse Commitment
  • Velocity, Sustainable Pace
  • Burn Down Chart, Kanban Board, Cumulative Flow Diagram
  • Stand Up, 3 Questions, Definition Of Done, Review Meeting
  • Retrospectives
  • Coding Standards, Code and Design Reviews, Pair Programming
  • Technical Debt, Refactoring, Unit Testing (TDD, BDD, ATDD)
  • Iterative Development, Incremental Development
  • Version Control, DVCS, Integration
  • Continuous Delivery, Continuous Release, Feature Toggle

Philosophies

Systems

Processes

Visual Studio 2013 ALM Customization and configuration

 

Using Visual Studio ALM and TFS, you gain access to a wealth of tools, many of which you configure or can customize. Common areas that team’s customize include team alerts, team home page, shared queries, and test platforms. For on-premises deployments, you can also add fields, customize work item types (WITs), and Agile planning tools. 

Below are some links to topics that show you how to configure or customize areas related to version control, build, work tracking, and testing using TFS.

Version control and build

You can manage source code using Team Foundation version control (TFVC) or Git.  Here is a view into the areas you can configure when working with source code and builds.  Go here for details on the differences between TFVC and Git

    Several test types use the test case WIT to plan and run tests. You can customize the test case in the same way as you customize other WITs.

    Build

    § Build process

    § Build definitions

    § Gated check-in builds (TFVC)

    § Continuous integration builds

    § Build numbers

    § Build notification email alerts

    § Build quality values

    § Restrict access/set permissions

    Work item fields

    § Modify a field or add a custom field (*)

    § Customize a pick list

    § Add rules to a field

    § Add a custom field

    § Rename a field,

    § Change reporting attributes

    § Synchronize a person name field

    § Add custom controls to a field

    § Add a field that supports integration with test, build, or version control

    § Add or change how Project fields map to TFS fields (*)

    Work

    Teams track plans and work using work items. You can add fields, change workflow, and add information to forms by customizing WITs. Teams can customize many elements that support their collaboration, as well as being able to customize the Agile planning tools and experience.   Items marked with an asterisk (*) are available only with TFS on-premises deployments.

    Teams

    § Add a team , setup team hierarchies

    § Add team members

    § Schedule team sprints

    § Configure team room events

    § Specify team alerts

    § Set team admin permissions

    § Create work item templates

    clip_image002

     
    Agile planning tools

    § Customize the Kanban board

    § Change the backlog or task board

    § Quick add panel 

    § Default backlog columns 

    § Weekend days 

    § Max # of work items on the task board

    § Color of work item types

    § Change a field used in a chart or tool

    § Map workflow states to meta states

    § Add work item types to a backlog (*)

    § Add portfolio backlogs (*)

    Use a team field to support teams

    Are there other areas that you can customize? 

    The short answer is Yes.  Other areas include TFS groups, permission, and process templates. Also, new features, such as Application Insights and Release Management  present additional areas to configure or customize, We’ll add information about those areas later.

    Queries, filters, and reports

    § Define queries

    § Add tags

    § View progress charts

    § Create Excel reports from a work item query (*)

    § Customize a report (SQL Server Reporting Services) (*)

    § Modify or add a custom work item type (*)

     
     
    Work item types

    § Customize the states, reasons, or transitions of a workflow

    § Add text, hyperlinks, or web content  to a form

    § Add a new WIT

    § Add or remove a WIT from a backlog or task board

    § Change the color associated with a WIT

    § Rename a WIT

    § Delete a WIT

    § Add custom controls to a WIT

    § Add a WIT to the portfolio backlog (*)

    § Change the maximum attachment size for work items (on-premises only)

    Test

    With VS, Microsoft Test Manager, and Team Web Access, you can create unit, manual, exploratory, and automated system tests. These tests reference one or more of the test elements as part of their execution.  Items marked with an asterisk (*) are available only with TFS on-premises deployments.

    Test types

    § Manual tests, test plans, and test suites

    § Exploratory tests

    § Automated system tests

    § Unit tests

    § Web performance and load tests

    § System testing

    § Windows store app testing

    Lab management

    § Standard lab environments

    § SCVMM (virtual) environments

    Test elements

    § Test configurations (test platforms)

    § Test failure types (*)

    § Test resolution states (*)

    § Test settings

    § Test tool extension (API reference)

    § Custom code and plug-ins for web performance tests

    § Specify a custom work item type as default bug type (*)

    What other scenarios would you like to see more content coverage with regard to configuration or customization?

    Microsoft Technology Developer Conferences in 2014

     

    I was surprised that is difficult to find a list of software developer conferences in one place, for my case, I am most interested in conferences that target Microsoft technologies and are in the United States.

    For 2014, the five conferences that are most relevant to me are two conferences put on by Microsoft (Build and TechEd), and three conferences put on by non-Microsoft companies with Microsoft sponsorship (DevConnections, DevIntersection, Visual Studio Live). I’ve attended some of these events in the past, and in general, can recommend all of them.

    1. Visual Studio Live, March 10-14, Las Vegas
    2. Microsoft TechEd, May 12-15, Houston
    3. Microsoft Build, unknown date, unknown city
    4. DevConnections, September 15-19, Las Vegas
    5. DevIntersection, November 9-12(?), Las Vegas

    Visual Studio Live – This conference has been around for many years. VS Live tends to have a broad range of topics and targets a wide range of skill sets, but mostly intermediate-level developers I’d say. In 2014, VS Live will be in Las Vegas in March, Chicago in May, Redmond in August, and Orlando in November. Speakers come from both Microsoft and other companies. Visual Studio Live is put on by 1105 Media which does a lot of other kinds of conferences, and also publishes Visual Studio Magazine. Highly recommended for intermediate-level developers but beginners and advanced developers can find some interesting talks too.

    Microsoft TechEd – In 2014 TechEd swallows the Microsoft Management Summit. In the past TechEd emphasized training for developers and IT people, and MMS emphasized training and products for IT people. Each year there was more overlap so combining the events makes sense. In 2014 TechEd will be in Houston – an unusual choice of venue. Highly recommended for IT pros and enterprise developers.

    Microsoft Build – Build targets Web, system, desktop, mobile, and embedded software developers. Build is a combination of the old PDC (Professional Developer Conference) for traditional software developers and MIX (originally stood for “Meet, Interact, eXplore”) for Web developers. The dates and location of Build have not been announced but my wild guess is that Build will be in October in Las Vegas. Highly recommended for developers of all skill levels.

    DevConnections – Another long-running conference that’s been around for at least 10 years. DevConnections has gone through some management changes recently, and 2013 was the first event put on by the new team. I wasn’t there but some of my friends say the event was similar to previous years, focusing on intermediate-level developers. DevConnections is a bit broader in scope than Visual Studio Live and targets developers and SQL people and IT people. The conference is run by Penton Media, a big company that does many events and publishes magazines including Windows IT Pro and SQL Server Pro. The 2014 event is scheduled for September 15-19 in Las Vegas. Recommended for developers with intermediate and beginning level skills.

    DevIntersection – DevIntersection held its first event in 2012. DevIntersection is a spin-off from DevConnections; in spite of the similar names, the two events are not related, however the conferences are similar in the sense that they target a broad audience. The people who now run DevIntersection used to run DevConnections, and I always thought those events were very nice. The Spring 2014 event will be from April 13-15 in Orlando Florida (too far away for me). The Fall 2014 event dates and location have not been announced but my guess is early November in Las Vegas. Highly recommended for developers with intermediate and beginning level skills.

    There are many other conferences for software developers who use the MS technology stack, but I can recommend the five here on the basis of personal experience. All these conferences are a bit pricey (well to me anyway). For example, the TechEd Conference, not including hotel and travel, is about $2000. The Visual Studio Live conference is about $1600. It’s a tough sell to get your company to foot the bill for one of these conferences but maybe you can convince your management that the knowledge you’ll gain, your improved morale, and increased energy and productivity you’ll have after returning, are worth the price of one of these conferences.

    Develop Workflows 2013 – Visual Studio vs SharePoint Designer decision tree

    General speaking (Visual Studio or SharePoint Designer) differs somewhat, but the concept of the declarative workflow remains the same.

    Declarative workflows

    Let’s first be clear what is meant by “declarative” workflows. This term means that instead of being authored in code and then compiled into managed assemblies, the workflow is described (literally) in XAML and then executed interpretively at run time.

    The XAML is derived (or inferred) from the workflow building blocks that you manipulate in the Workflow Designer (if using Visual Studio) or SharePoint Designer workflow design surface (or Visio, but more about that later). The building blocks themselves are the visual workflow design objects in the designer toolbox—stages, conditions, actions, events, and so on. The set of tools in the respective toolboxes (Visual Studio or SharePoint Designer) differs somewhat, but the concept of the declarative workflow remains the same.

    Decision tree: SharePoint Designer vs. Visual Studio



    Feature / Requirement

    SharePoint Designer

    Visual Studio

    Allows rapid workflow development

    Yes

    Yes

    Enables reuse of workflows

    A workflow can be used only by the list or library on which it was developed. However, SharePoint Designer provides reusable workflows that can be used multiple times within the same site.

    A workflow can be written as a template so that after it is deployed, it can be reused and associated with any list or library.

    Allows you to include a workflow as part of a SharePoint solution or app for SharePoint

    No

    Yes

    Allows you to create custom actions

    No. However, SharePoint Designer can consume and implement custom actions that are created and deployed by using Visual Studio.

    Yes. However, be aware that in Visual Studio, the underlying activities, not their corresponding actions, are used.

    Allows you to write custom code

    No

    No

     

    Can use Visio Professional to create workflow logic

    Yes

    No

    Deployment

    Deployed automatically to list, library, or site on which they were created.

    Create a SharePoint solution package (.wsp) file and deploy the solution package to the site (SPWeb).

    One-click publishing available for workflows

    Yes

    Yes

    Workflows can be packaged and deployed to a remote server

    Yes

    Yes

    Debugging

    Cannot be debugged.

    Workflow can be debugged by using Visual Studio.

    Can use only actions that are approved by site administrator

    Yes

    Yes

     

    This is changed from previous versions. Previously, workflows and actions that were authored by using Visual Studio were code-based and deployed at the farm scope, so administrator approval was not required.

    Among the greatest advantages of the workflow framework in SharePoint 2013 is the ease with which information workers can use the no-code environment of SharePoint Designer to create rich and powerful workflows. Additionally, a high degree of flexibility and customization is available in a declarative authoring environment such as Visual Studio.

    Both of these workflow authoring environments—SharePoint Designer and Visual Studio—offer specific advantages and disadvantages. In this section, we explore how to determine which authoring environment best suits your workflow development needs.

    Using SharePoint Designer
    • Target users: Information workers, business analysts, SharePoint developers.

    • Difficulty level: Familiarity with SharePoint Designer, including the core workflow components, such as stages, gates, actions, conditions, and loops.

    With SharePoint Designer, users can create a workflow that is attached to a list, library, or site using a no-code, text-based designer. Or, they can use the new visual design environment in which graphical elements are arranged on a design surface to represent the logical flow of a business process. SharePoint Designer excels at enabling rapid workflow development by non-technical workers.

    Using Visual Studio
    • Target users: Intermediate or advanced software developers.

    • Difficulty level: Familiarity with Visual Studio, including software development concepts such as event receivers, packaging and deployment, and security.

    Authoring workflows in Visual Studio provides flexibility to create workflows to support virtually any business process, regardless of its complexity, and allows debugging and reuse of workflow definitions. Perhaps most important, Visual Studio lets developers include SharePoint workflows as part of a broader SharePoint solution or app for SharePoint.

    Visual Studio enables developers to create custom actions for consumption by SharePoint Designer, and provides the means to execute custom logic. With Visual Studio, developers can also create workflow templates, which can be deployed to multiple sites.

    Comparing SharePoint Designer with Visual Studio

    The following table provides a side-by-side comparison of the features and requirements for using SharePoint Designer and Visual Studio to create SharePoint workflows.

    This is changed from previous versions. Previously, workflows and actions that were authored by using Visual Studio were code-based and deployed at the farm scope, so administrator approval was not required.

    Developing workflows using Visual Studio

    Unlike earlier versions, workflows in SharePoint 2013 are entirely declarative. Built now on Windows Workflow Foundation 4, Visual Studio provides a visual workflow designer surface that lets you create custom workflows, workflow templates, forms, and custom workflow activities entirely in the designer environment. Your workflow is then packaged and deployed as a SharePoint Feature. For information about Feature packaging, see Using Features in SharePoint Foundation.

    Perhaps the most significant change for Visual Studio developers is that custom workflows are no longer compiled and deployed as .NET Framework assemblies. Furthermore, SharePoint 2013 no longer uses Microsoft InfoPath forms; instead, forms generation relies on Microsoft ASP.NET forms.

    Finally, the Visual Studio workflow project templates have changed. Whereas formerly templates for state machine and sequential workflows were provided, these distinctions are no longer meaningful. Rather, Visual Studio project templates are available in the Visual Studio build provided on your virtual machine (VM).

    Enabling on-premises workflow debugging


    To debug on-premises workflows in Visual Studio, you need to temporarily allow the Workflow Manager Tools to access your system through the firewall.
    1. In Windows Control Panel, choose System and Security, Windows Firewall.

    2. In the Control Panel Home list, choose the Advanced Settings link.

    3. In the left pane of Windows Firewall, choose Inbound Rules.

    4. In the Inbound Rules list, choose Workflow Manager Tools 1.0 for Visual Studio 2012 – Test Service Host.

    5. In the Actions list, choose Enable Rule.

    6. On the properties page of your SharePoint project, choose the SharePoint tab, and then select the Enable Workflow debugging check box.

    Debugging SharePoint Online workflows using Visual Studio

    To debug SharePoint Online workflows in Visual Studio, perform the following steps:

    1. If you’re behind a firewall, you may need to install a proxy client (such as the Forefront Threat Management Gateway (TMG) Client), depending on your company’s network topology.

    2. Register for a Windows Azure account if you haven’t already, and then sign into that account.

      For information about how to register for a Windows Azure account, see Windows Azure.

    3. Create a Windows Azure Service Bus namespace, which you can use to debug remote workflows. You can do this on the Windows Azure management portal.

      For more information about the Windows Azure Service Bus, see Messaging and Managing Service Bus Service Namespaces.

      SharePoint Online workflow debugging uses the Relay Service component of the Windows Azure Service Bus, so you’ll be charged for using the Service Bus. See Service Bus Pricing FAQ. You get free access to Windows Azure each month that you subscribe to Visual Studio Professional with MSDN, Visual Studio Premium with MSDN, or Visual Studio Ultimate with MSDN. With this access, you can use the Service Bus relay for 1,500, 3,000, or 3,000 hours, depending on your MSDN subscription. See Get some amount of Windows Azure Services each month at no additional charge.

    4. In Windows Azure, choose your service namespace, choose the Access Key link, and then copy the text in the Connection String box.

    5. On the properties page of your app for SharePoint project, choose the SharePoint tab, and then select the Enable Workflow debugging check box.

      You must enable this feature to debug workflows in SharePoint Online. This property applies to all of your SharePoint projects in Visual Studio. Visual Studio automatically turns off workflow debugging if you package your app for distribution on the Office store.

    6. Select the Enable debugging via Windows Azure Service Bus check box. Then, in the Windows Azure Service Bus connection string box, paste the connection string that you copied.

    After you enable workflow debugging and provide a valid connection string for the Windows Azure Service Bus, you can debug SharePoint Online workflows.

    If you haven’t disabled workflow debugging and don’t want to receive a notification whenever your project contains a workflow, clear the Notify me if Windows Azure Service Bus debugging is not configured check box.