How to Use TFS with Scaled Agile Framework

Using Team Foundation Server , Visual Studio,  to increase productivity and transparency into your application as well as increase the rate at which you can ship high quality software throughout the application lifecycle.

In the Following whitepaper is described How-TO: Using TFS to support epics, release trains, and multiple backlogs.

The Scaled Agile Framework, or SAFe, is  popular among organizations looking to scale Agile practices to the enterprise level. SAFe is a comprehensive framework, covering practices from portfolio level planning to release planning to coding practices.

While TFS does not provide full support for all SAFe practices, TFS can be used to implement many of the planning practices. This whitepaper also provides practical guidance on how to implement SAFe practices using TFS. It covers the following topics:


The first two sections are conceptual and provide a quick overview of how TFS supports SAFe.The last two sections are guidance and provide detailed steps for the TFS Administrator to configure and customize TFS to support SAFe.

Mapping SAFe concepts to TFS concepts

  SAFe supports a portfolio view of multiple agile teams. SAFe illustrates how a portfolio vision is met by a hierarchy of teams, all of whom have their own specific objectives. This framework breaks down Epics into Features and Stories, which teams work on in Sprints and deliver through Program Increments (PIs) and Release Trains. Also, the portfolio backlog can track how deliverables map to Strategic Themes and associated budgets.

SAFe architectural overview © D. Leffing..

Image courtesy of Leffingwell, LLC.

The examples in this paper illustrate how to add the Epic WIT and backlog, configure a three-level team hierarchy, and map teams to their respective area and iteration paths. The examples build from the TFS Agile process template. However, the changes can be applied to any TFS process template.

TFS structure to support SAFe

SAFe Portfolios, Programs, and Teams map to TFS team projects and teams

Because TFS supports a hierarchical team structure, each team has its own view of their work which rolls up to the next level within the team hierarchy.

SAFe roles to TFS teams


In the section, “Customize TFS process to support SAFe”, details the changes to our Scrum, Agile, and CMMI process templates which enable SAFe support. The goal is not to create a SAFe Process Template, but modify existing process templates to enable SAFe practices. This changes are minimal and don’t encumber teams who choose not to use SAFe.

Now, you have the following options to update the templates to include these changes :

  1. You can download the standard Scrum, Agile, CMMI process templates with changes for SAFe here.
  2. If you have customized process templates, you can follow the instructions in the guidance. Additionally, in this blog post shows how to automate the process with PowerShell.

This whitepaper assumes a familiarity with the Scaled Agile Framework. If you’re familiar with Scrum but not familiar with SAFe, Inbar Oren has published these great videos which explain the basic SAFe concepts quickly.

Have fun with the process!

Building Multi Device applications with C++ and Visual Studio

Here are some trough’s about consuming a C++ library project from a Windows Phone Store app using a separate intermediary WinRT component. All this steps can equally be applied to a Windows Store app and therefore a solution that utilizes the Universal App template. 

Building a Windows Phone Store app, are occasions when you wish to consume logic within a library that you have previously written in C++. In order to consume this logic in a Windows Phone Store app you need to wrap that C++ logic using a WinRT component. 

WinRT component acts as a wrapper around the C++ code and is a means of projecting the correct types across the Abstract Binary Interface (such that it may be consumed in languages other than C++). There is documentation and sample code demonstrating how to achieve this, but they tend to show the C++ logic inside of the WinRT component. In the real-world, you will likely have the C++ code in a separate project and therefore you need to reference this from the WinRT component. This scenario is not well documented and so it is the objective of this article to walk through how you can go about setting this up.

Before getting started, the diagram in Figure 1 shows the projects that are, or will be involved.

Figure 1

First on the left you have the “DemoApp.WindowsPhone” project, which contains the application’s user interface. On the right you have the “DemoWin32Project”, which contains the C++ logic that you wish to call. In this example, the project is assumed to be a Win32 DLL project type. The grey box in the middle of the diagram represents the C++ WinRT component that wraps the calls to the Win32 DLL project. Assumes that you already have the components in blue already at your disposal, whilst the component in grey needs to be created.

First step is going to be to create the C++ WinRT component. You can do this by adding a new project to the solution and selecting the appropriate project type as demonstrated in following  Figure . For the purposes of this example, the project type will be “Windows Runtime Component (Windows Phone) – Visual C++” (highlighted in green) as the application calling it is a Windows Phone Store app. If this was a Windows Store app or a Universal app, then you would instead need to select the appropriate project type shown in amber.

Figure 2

When you have created the component, you can add a project reference to it from the application project (in this case “DemoApp.WindowsPhone”). This is shown in following Figure .

Figure 3

As the application project is going to be referencing C++ code, it is also important to add a reference to the Visual C++ runtime as shown in above Figure.

Figure 4

Add reference to Visual C++ runtime

The actual code logic that sits inside of these projects will not be discussed here as they are largely irrelevant.

At this point, you will want to add a reference from the Windows Runtime Component to the Win32 DLL project. It is this step that is not so well documented. Unfortunately there is a little bit of extra work to do here to get this working, although it is extremely straightforward once you know how.

First Figure implies that the C++ component is a Win32 Project type. This is intentional as it is likely the project type your existing code sits inside of – however, in order to be able to reference this logic, the code needs to sit inside of a subtly different project type. The next step is to create that other project type, which in this example will be “DLL (Windows Phone) – Visual C++” – highlighted in green in following Figure . If this was a Windows Store app or a Universal app, then you would instead need to select the appropriate project type shown in amber.

Figure 5

Creating the appropriate DLL project type

When  this is done, you need to either move across the C++ source files from your Win32 DLL project, or add a link to their existing location – either way, this entails just a few manual steps depending on how many source files you have. Finally, you will need to make sure you are referencing the appropriate header files in the DLL project from the Windows Runtime Component project so that you can code against the DLL. Following Figure  shows the configuration that needs to be applied in green.

Figure 6

Add ‘Include Directories’ to point to appropriate .h files

Now that these steps have been completed, you are in good shape for referencing the DLL project from the Windows Runtime Component project. Simply, go to the project properties of the Windows Runtime Component and navigate to “Common Properties” -> “References”. In this dialog you can point to the project created in above Figure  as demonstrated in following Figure .

It is worth pointing out that you can’t simply reference any and every Win32 DLL project as there is only a subset of Win32 APIs that are supported inside of a Windows Runtime project.

Figure 7

Add a reference to the newly created DLL project type

At this point, you have created the chain shown in first Figure – in other words, the application code on the left-hand side of the diagram may now call into the C++ logic held by the DLL project type by means of the Windows Runtime Component.

These steps may be followed for either a Universal App template or a Windows Store app as appropriate. Furthermore, Visual Studio allows you to right-click each the projects represented by the first two boxes in first Figure and select “Add Windows 8.1” to automatically create the Universal App project structure. This can also be performed on the project created in above Figure  which essentially gives the solution structure represented in following Figure .

Figure 8

Universal App calling a C++ library

Just only step that remain is to update the reference to point to the correct platform implementation as illustrated by the arrows in the diagram.

Running Wcf Service Stress / Load Tests

 Visual Studio ALM Rangers Team have  released the WCF Load Test tool. The tool has been maturing over the last 3-4 years .

How WCF Load Test works ?

WCF Load Test is a tool that auto-creates unit tests in your test projects. These unit tests can then be used to load test WCF services. It works by reading a log of the calls made to the WCF services and translating those calls into C# code. The tool can read WCF message logs which have been taken on the client or the service side. It can also handle ASMX services, even if there is no WCF client involved, which comes from the capability to process Fiddler2 traces.

You might think that generating code from a trace is just going to replay the same data over and over again in a load test. That could be a major limitation. To address this, the generated code also supports parameterization of the data passed to the WCF services so that you can include your own data randomization code for a more realistic test.

How To Generate a Unit Test

The tool just needs a WCF message log and the contract assemblies to be able to work. One way to do this is to run a client program that calls the WCF service which creates a message log. The message log represents the scenario you want to test. The WCF Load Test project item wizard makes this process straight forward by automating a few steps like changing the diagnostic settings in the config file and undoing the changes at the end.

The rest of this posting shows you how to generate a unit test that can be used to load test a WCF service. I am assuming you already have a WCF service and a client program which can be used to exercise the service.

The service we are using implements the following contract:

public interface IOpArithmetic
int Add(int a, int b);

Response Subtract(Request req);

public class Request
public int A;

public int B;

public class Response
public int Answer;

The Subtract operation uses DataContracts just to show how this is supported.

Step 1: Installing WCF Load Test

Just head over to and download “Version 3.0 (Beta) – Build”. Included in the ZIP file, you can find an MSI called wcfuniten.msi. Double-clicking this installs the tool. It will add a new item for test projects into Visual Studio Team System 2008 (Professional, Development Edition, Test Edition or Team Suite) and Visual Studio 2010 Ultimate Edition.

Step 2: Create a Test Project

We start by creating a Test project. This release can only generate C# code which means a C# Test project is required.


Once the load test project has been created you can delete the UnitTest1.cs file that gets added, as you don’t really need it.

Step 3: Start the Wizard

We are now ready to start the wizard which will generate the unit test. To do this, right-click the load test project in the Solution Explorer and select Add -> New Test.


When you click OK the first screen of the wizard will appear:


Step 4: Run the Client Executable from the Wizard

Click Next to move to the next screen in the wizard. Type in or browse to the executable for the client that exercises your service.


Once you have entered a path to the executable, the wizard enables the Run button. Click the run button and the wizard edits the config file to make sure that it creates a suitable message log and starts your executable. Use your client to execute the scenario that you wish to capture in the message log and then exit the client. Remember to check that the service you want to test is already running.

If you already have a message log, which can come from either the client or the service side, you can supply the details on this screen. The usage notes document that comes with the tool describes the settings you need if you want to capture your own message log.

Whichever method you use to give the wizard the message log, the wizard parses the message log and then enables the Next button, click this to move to the next step.

Step 5: Set the Code Generation Options

The wizard shows you the list of distinct operations it has found in the message log. You can choose which operations you want to include in the unit test. If you choose to include transaction timers, the generated code will include a per-call timer that collects data on how long each call to the service takes. This information is collected, stored and displayed by the Visual Studio load test. You can also choose to have a separate unit test for each service call if you like.

Note that by default it will generate a single unit test that represents the entire scenario. In this example there will be a single unit test which calls Add and Subtract.


Step 6: Select the Contract Assembly

The next wizard screen is used to tell the wizard which assemblies contain the contract definitions required for the wizard to be able to generate the unit test code. If you are using a compile time proxy, you must supply the assembly containing the proxy – and any of its dependencies. If you only have a contracts assembly then that is what you need to supply. The code generated is slightly different depending on whether you have a proxy or not.


Use the Add button to add the necessary assemblies. This will enable the Finish button. Click Finish and the code will be generated and added to your Test project.

Here is a snippet from the generated code:

public void WCFTest1()
// 15/09/2010 06:47:14
// 15/09/2010 06:47:14

private int Add()
int a = 10;
int b = 20;
this.CustomiseAdd(ref a, ref b);
return arithmeticClient.Add(a, b);

private Contracts.Response Subtract()
Contracts.Request req = new Contracts.Request();
req.A = 10;
req.B = 5;
return arithmeticClient.Subtract(req);

You can see from this example that the main test method simply calls two methods. These two methods represent the two service calls that the client initiated. The bodies of those methods set up the parameters for the call and then call the service.

You will see that the parameters are firstly passed to another method, called CustomiseXXX. This is the method which has been generated in a separate partial class file. You can put any code you like here to amend the parameter values so that you can vary the data. The generated code actually does nothing and leaves your parameter values intact. You can tie the customization methods into the data-driven testing feature to drive the parameter values from a spreadsheet or a database table.

Step 7: Configure the Service Client

The last step to getting a working unit test is to set up the test to connect to the WCF service. The code to make this happen is in the same partial class file that contains the customization methods. It contains the InitializeTest method which creates the service proxies for all the required services, but as it stands it does not have enough information. You need to add an app.config file to the Test project which contains the client configuration. Usually, the config file is just copied from the client program used to exercise the service. You can then modify InitializeTest to load the relevant configuration.

Here is an example with the change (the addition of “Basic”). The name “Basic” refers to the endpoint configuration name in the app.config file:

public void InitializeTest()
arithmeticProxyTable.TryGetValue(System.Threading.Thread.CurrentThread.ManagedThreadId, out arithmeticClient);
if (((arithmeticClient == null)
|| (((System.ServiceModel.ICommunicationObject)(arithmeticClient)).State == System.ServiceModel.CommunicationState.Faulted)))
// The following line may need to be customised to select the appropriate binding from the configuration file
System.ServiceModel.ChannelFactory<Contracts.IArithmetic> arithmeticFactory
= new System.ServiceModel.ChannelFactory<Contracts.IArithmetic>("Basic");

arithmeticClient = arithmeticFactory.CreateChannel();
arithmeticProxyTable[System.Threading.Thread.CurrentThread.ManagedThreadId] = arithmeticClient;

The code above is slightly different if you gave the wizard a compile-time proxy to use instead of just a contract.

Step 8: Run Some Tests!

The unit test that the tool generated is now ready to be tested. Just run it as a unit test first to make sure it runs OK. Remember to check that the service you are testing is up and running. Once the unit test is running you can then go ahead and add it to a Visual Studio Load Test with the standard procedure for running load tests.


We have seen how to create load tests for WCF services by capturing message logs and turning them into executable code which can be run by the Visual Studio Load Test feature. If you want to explore the tool further there are other features which for example support testing ASMX services. The tool is not limited to one WCF service and can also generate tests that invoke multiple services.

Visual Studio ALM Rangers love to get your feedback. Please use the discussions tab on the CodePlex site to submit your feedback.

User Stories – the most misused tool in the Software Development Universe

Originally posted on Control Your Chaos:

User stories may be by far the most abused and misused XP practice ever conceived. First, look at the name. It’s a USER story. So you need an actual live human being to even meet the naming criteria.

heartbeatAs a <certain type of user>

I want/need <some functionality>

So that <business reason>

The Green Part:

This is a description of a certain user. Be as specific as you can be. Ideally there will be the first and last name of an actual user of the functionality. If you write “As a user I want …” this doesn’t really count as a user story.

The Red Part:

This is the part where you actually write the functionality you think will work in this case. Easy-peasy.

The Blue Part:

This is the hardest part. You actually need to know why does your user need that piece of functionality. Heavens, you will have…

View original 644 more words

How To Use TFS 2013 with SharePoint 2013 Sp1 and Sql 2012 sp1 on Windows 2012 R2


Works with SQL 2014 also, with TFS 2013 Update 2

Originally posted on Cosmin's Hooking testify:

In new deployment scenarios you will need the TFS 2013 or 2012 on an windows 2012 R2 server, that will never support SharePoint 2010, so we need an SharePoint 2013 SP1, that support windows 2012 R2 for now.

Before Run all Windows Updates before installing SharePoint 2013, and get the CU updates of Sql2012 sp1 and SharePoint 2013 Sp1 .

If That box already has TFS 2013 on an windows 2012 R2 server . by Installing updates are the key steps that will prevent tantrums from SharePoint . Always, install of the required updates and ideally the optional ones also.

installation of SharePoint 2013 with Sp1

SharePoint Team They have really slicked up the installation process for SharePoint,

Instead use the auto-run that comes from running the DVD directly, or you can just run the “prerequisiteinstaller” from the root first.


When the prerequisites are complete you can start the installation…

View original 1,047 more words

.Net (4.0 up) Enterprise Caching Strategies & tips

Enterprise Library Caching Block is with version 6.0 Absolete , being replaced with MemoryCache or AppFabric.

We frequently get asked on best practices for using Windows Azure Cache/Appfabric cache.  The compilation below is an attempt at putting together an initial list of bestpractices. I’ll publish an update to this in the future if needed.

I’m breaking down the best practices to follow by various topics.

Using Cache APIs

1.    Have Retry wrappers around cache API calls

Calls into cache client can occasionally fail due to a number of reasons such as transient network errors, cache  servers being unavailable due to maintainance/upgrades or cache servers being low on memory. The cache client raises a DataCacheException with an errorcode that indicates the reason for the failure in these cases. There is a good overview of how an application should handle these exceptions in MSDN

It is a good practice for the application to implement a retry policy. You can implement a custom policy or consider using a framework like the TransientFault Handling Application Block

2.    Keep static instances of DataCache/DataCacheFactory

Instances of DataCacheFactory (and hence DataCache instances indirectly) maintain tcp connections to the cache servers. These objects are expensive to create and destroy. In addition, you want to have as few of these as needed to ensure cache servers are not overwhelmed with too many connections from clients.

You can find more details of connection management here. Please note that the ability to share connections across factories is currently available only in November 2011 release of the Windows Azure SDK (and higher versions). Windows Server appfabric 1.1 does not have this capability yet.

Overhead of creating new factory instances is lower if connection pooling is enabled. In general though, it is a good practice to pre-create an instance of DataCacheFactory/DataCache and use them for all subsequent calls to the APIs. Do avoid creating an instance of DataCacheFactory/DataCache on each of your request processing paths. 

3.    WCF Services using Cache Client

It is a common practice for WCF services to use cache to improve their performance. However, unlike web applications, wcf services are susceptible for IO-thread starvation issues when making blocking calls (such as cache API calls) that further require IO threads to receive responses (such as responses from cache servers).

This issue is described in detail in the following KB article – The typical symptom that surfaces if you run into this is that when you’ve a sudden burst of load, cache API calls timeout. You can confirm if you are
running into this situation by plotting the thread count values against incoming requests/second as shown in the KB article.

4.    If app is using Lock APIs – Handle ObjectLocked, ObjectNotLocked exceptions

If you are using lock related APIs, please ensure you are handling exceptions such asObjectLocked ( and ObjectNotLocked (Object being referred to is not locked by any client) error.

GetAndLock can fail with “<ERRCA0011>:SubStatus<ES0001>:Object being referred to is currently locked, and cannot be accessed until it is unlocked by the locking client. Please retry later.” error if another caller has acquired a lock on the object.

The code should handle this error and implement an appropriate retry policy.

PutAndUnlock can fail with “<ERRCA0012>:SubStatus<ES0001>:Object being referred to is not locked by any client” error.

This typically means that the lock timeout specified when the lock was acquired was not long enough because the application request took longer to process. Hence the lock expired before the call to PutAndUnlock and the cache server returns this error code.

The typical fix here is to both review your request processing time as well as set a higher lock timeout when acquiring a lock.

You can also run into this error when using the session state provider for cache. If you are running into this error from session state provider, the typical solution is to set a higher executionTimeout for your web app.

Session State Provider Usage

You can find more info about session state providers for appfabric cache hereand azure cache here.

The session state provider has an option to store the entire session as 1 blob (useBlobMode=”true” which is the default), or to store the session as individual key/value pairs.

useBlobMode=”true” incurs fewer round trips to cache servers and works well for most applications.

If you’ve a mix of small and large objects in session, useBlobMode=”false” (a.ka. granular mode) might work better since it will avoid fetching the entire (large) session object for all requests. The cache should also be marked as nonEvictable cache if useBlobMode=”false” option is being used. Because Azure shared cache does not give you the ability to mark a cache as non evictable, please note that useBlobMode=”true” is the only supported option against Windows Azure Shared cache.

Performance Tuning and Monitoring

            Tune MaxConnectionsToServer

Connection management between cache clients and servers is described in more detailhere. Consider tuning MaxConnectionToServer setting. This setting controls the number of connections from a client to cache
servers. (MaxConnectionsToServer * Number of DataCacheFactory Instances *Number of Application Processes) is a rough value for the number of connections that will be opened to each of the cache servers. So, if you have 2 instances of your web role with 1 cache factory and MaxConnectionsToServer set to 3, there will be 3*1*2 = 6 connections opened to each of the cache servers.

Setting this to number of cores (of the application machine) is a good place to start. If you set this too high, a large number of connections can get opened to each of the cache servers and can impact throughput.

If you are using Azure cache SDK 1.7, maxConnectionsToServer is set to the default of number of cores (of the application machine). The on-premise appfabric cache (v1.0/v1.1) had the default as one, so that value might need to be tuned if needed.

            Adjust Security Settings

The default security settings for on-premise appfabric cache is to run with security on at EncryptAndSign protection level. If you are running in a trusted environment and don’t need this capability, you can turn this off by explicitly setting security to off.

The security model for Azure cache is different and theabove adjustment is not needed for azure cache.


There is also a good set of performance counters on the cache servers that you can monitor to get a better understanding of cache performance issues. Some of thecounters that are typically useful to troubleshoot issues include:

1)     %cpu used up by cache service

2)     %time spent in GC by cache service

3)     Total cache misses/sec – A high value here can indicate your application performance might suffer because it is not able to fetch data from cache. Possible causes for this include eviction and/or expiry
of items from cache.

4)     Total object count – Gives an idea of how many items are in the cache. A big drop in object count could mean eviction or expiry is taking place.

5)     Total client reqs/sec – This counter is useful in giving an idea of how much load is being generated on the cache servers from the application. A low value here usually means some sort of a bottleneck
outside of the cache server (perhaps in the application or network) and hence very little load is being placed on cache servers.

6)     Total Evicted Objects – If cache servers are constantly evicting items to make room for newer objects in cache, it is usually a good indication that you will need more memory on the cache servers to hold
the dataset for your application.

7)     Total failure exceptions/sec and Total Retry exceptions/sec

Lead host vs Offloading

This applies only for the on-premise appfabric cache deployments. There is a good discussion of the  tradeoffs/options in this blog – As noted in the blog, with v1.1, you can use sql to just store config info and use lead-host model for cluster runtime. This option is attractive if setting up a highly-available sql server for offloading purposes is hard.

Other Links

Here are a set of blogs/articles that provide more info on some of the topics covered above.

1)     Jason Roth and Jaime Alva have written an  article providing additional guidance to developers using Windows
Azure Caching.

2)     Jaime Alva’s blog post on logging/counters for On-premise appfabric cache.

3)     MSDN article about connection management between cache client & servers.

4)     Amit Yadav and Kalyan Chakravarthy’s blog on lead host vs offloading options for cache clusters.

5)     MSDN article on common cache exceptions and Transient Fault Handling Application Block

.net Caching Strategies and Patterns–Distributed Cache


   Distributed caches are hosted and maintained by the Distributed Cache Service, itself a thin wrapper over a Windows Server AppFabric cluster. To understand Distributed Cache Service requires an understanding of AppFabric together with a few details on  implementation sample like SharePoint’s. AppFabric Physical Architecture

Logical Infrastructure AppFabric

Details regarding the logical infrastructure of an AppFabric cluster are provided here and in this diagram (from the same location):

The basic logical entity in an AppFabric Cache Cluster is a Named Cache (frequently just called a Cache). A Named Cache is a container for cached objects. The ten SharePoint caches listed before are each AppFabric Named Caches. As illustrated here, Named Caches span all hosts in the cluster, distributing items for storage across allocated memory on all servers; however, cached items within a Named Cache are stored only once (by default). This is an important consideration when planning cache infrastructure, so let’s spell it out again: by default (and in SharePoint), cached items in an AppFabric Named Cache are stored only once across the entire cluster. If the cache host storing that cached item crashes or is shutdown non-gracefully, that item is no longer available in the cache.

In the previous paragraph we introduced cache items. Items are often collected and stored in a Region, which is a sub-collection of cached items within a specific Named Cache. Storing items in a shared region can make retrieval of the entire related collection easier. Like individual cached items, though, regions exist on only a single host in the cluster by default (and in SharePoint). So if the server hosting the region is lost, all items in the region are lost with it. Also, note that all items in the Region, and the Region itself, are stored on a single cache host.

Both individual cached items and regions can co-exist in the same named cache. I believe that in SharePoint all cached items are contained within regions.

To list all named caches and regions in a cluster, run Get-AFCache | Format-Table –AutoSize. For a list of all caches, run Get-AFCache | Format-Table CacheName.

With the infrastructure described, let’s dive more deeply into configuration of caches and cache hosts.

Details of AppFabric’s physical architecture are described here and represented in this diagram (from the same location).

In SharePoint’s implementation, web and service applications are the “Cache-enabled application servers (cache clients).” The “Cache Servers” are SharePoint servers where the Distributed Cache Service Instance has been installed and enabled, and the “Cluster configuration storage location” is the CacheClusterConfig table in the SharePoint Configuration Database.

The CacheClusterConfig table in the Configuration Database stores configuration items as typed Key/Value pairs using a custom ICustomProvider implementation. The original values are XML snippets describing caches, cache hosts, and other properties. They are serialized and converted into byte arrays for storage, but most data can be deserialized and viewed using the Export-AFCacheClusterConfiguration cmdlet.

In in SharePoint  usage of AppFabric Cache Hosts

Management of Distributed Cache Service Instances (AppFabric Cache Hosts) in SharePoint is different than management of most SharePoint service instances. Most service instances always remain installed on servers in the farm, whether online or not. These service instances are like Windows services, which are always installed on a server whether they’ve been enabled or not. For example, the User Profile Sync Service Instance is typically only online and running on one server in the farm, but it’s installed – and disabled – on all servers. To see a list of all service instances installed on a given server, both online and disabled, run the Get-SPServiceInstance cmdlet, using the –Server parameter to limit results to a particular server. The Services on Server page in Central Administration displays the same information.

Unlike other service instances, though, the Distributed Cache Service Instance should either be installed *and* online on a SharePoint server, or not installed at all. If the service instance is stopped (disabled) but not uninstalled, details about the associated Cache Host stay in the Cache Cluster Config store, which can cause problems.

For this reason, the Distributed Cache Service Instance should never be stopped via the Services on Server page in Central Administration or via Stop-SPServiceInstance in PowerShell. A special cmdlet, Remove-SPDistributedCacheServiceInstance, is available to stop *and* uninstall the local Distributed Cache Service Instance from a SharePoint server. This cmdlet, and its complement Add-SPDistributedCacheServiceInstance, should be used instead of Stop- and Start-SPServiceInstance for managing the local Distributed Cache Service Instance.

By default, the Distributed Cache Service Instance is installed on every SharePoint server when it’s joined to a farm. If you prefer to not install the Distributed Cache Service Instance at join time, you can specify -SkipRegisterAsDistributedCacheHost when running the Connect-SPConfigurationDatabase or New-SPConfigurationDatabase cmdlets. Note that at least one server must be running the Distributed Cache Service Instance for the farm to function properly.

For all AppFabric clusters, a simple command for listing all known cache hosts in the cluster is Get-AFCacheHostStatus. Don’t forget to run Connect-AFCacheClusterConfiguration before running other AppFabric cmdlets. To retrieve a list of SharePoint servers running the Distributed Cache Service Instance, you can run the following PowerShell command:

PS:> Get-SPServer | ? {($_.ServiceInstances | % TypeName) -contains ‘Distributed Cache’} | % Address

We’ll discuss more details about cache hosts soon, but first let’s discuss AppFabric’s logical infrastructure.

Cache Configuration Details

As we consider configuration details for caches and cache hosts, our conversation will be dominated by memory management, resiliency, and availability issues. We’ll discuss expiration and eviction of items, throttling of requests, and redundant storage. As before, we’ll discuss concepts in general and use SharePoint as a specific example.

Later, we’ll discuss configuration of cache hosts, but first, let’s begin by discussing configuration of individual caches.

To view common configuration details for individual caches, run the following command:

PS:> Get-AFCache | % {Get-AFCacheConfiguration -CacheName $_.CacheName}

Output for the ViewState cache is displayed here:

CacheName                : DistributedViewStateCache_f3bd4763-f482-4bb8-a5a5-f40806460bdd
TimeToLive               : 10 mins
CacheType                : Partitioned
Secondaries              : 0
MinSecondaries           : 0
IsExpirable              : True
EvictionType             : LRU
NotificationsEnabled     : False
WriteBehindEnabled       : False
WriteBehindInterval      : 300
WriteBehindRetryInterval : 60
WriteBehindRetryCount    : -1
ReadThroughEnabled       : False
ProviderType             :
ProviderSettings         : {}

The meanings of each of these properties is as follows.

  • CacheName: The internal name of the cache.
  • TimeToLive: The default time span until expiry for cached items. Note that this can be overridden for any individual item, and in SharePoint there are different standard TTLs used for items in each named cache, as in the below table. Time until expiry has an important impact on eviction and memory management which will be discussed in the section on cache host configuration.
  • CacheType: How data is stored in the cache’s storage medium. Partitioned is the only option.
  • Secondaries: How many additional replicas of cached data are to be stored. Additional replicas provide redundancy and resiliency for cached items, as replicas are always stored on a different storage node. This is always 0 in SharePoint, where high availability is not currently supported.
  • MinSecondaries: The minimum number of secondaries which must be online to allow writing to the cache. By default, it’s the same as the number of secondaries configured for the cache. Always 0 in SharePoint, where there are no secondaries.
  • IsExpirable: Whether items in the cache are to be evicted after their TTL passes. Always True for SharePoint caches.
  • EvictionType: The algorithm used to evict non-expired items when a cache’s high watermark is passed. Can be set to LRU (Least Recently Used) or None. For most caches in SharePoint, this is set to LRU. For the ActivityFeedLMT cache, this is set to None. See the section on cache host configuration for details on eviction.
  • NotificationsEnabled: Whether the cache will notify subscribers when cached items are changed or deleted. Always False in SharePoint caches.
  • Read-Through and Write-Behind properties: The remaining properties specify details on Read-Through and Write-Behind for the cache. For details on RTWB concepts, see this MSDN article. SharePoint caches don’t utilize RTWB.

Most SharePoint caches have the same configuration. However, run this command to note that the ActivityFeedLMT cache has an EvictionType of None.

PS:> Get-AFCache | % {Get-AFCacheConfiguration -CacheName $_.CacheName} | Format-Table CacheName, EvictionType

Note that there are no quota-related properties specified at the cache level for SharePoint caches, or by default for any caches.

You may want to know how many items are stored and how much memory is in use for individual caches. For specific stats about each cache, run this command:

PS:> Get-AFCache | % {
    $CacheName = $_.CacheName
    Get-AFCacheStatistics -CacheName $CacheName | Add-Member -MemberType NoteProperty -Name ‘CacheName’ -Value $CacheName -PassThru

I’ve added a little formatting and cleanup to get detailed information about each cache together with its name. You could pipe the output from this command to Export-Csv to create a short report. Typical output is shown here:

CacheName         : DistributedLogonTokenCache_f3bd4763-f482-4bb8-a5a5-f40806460bdd
Size              : 36864
ItemCount         : 6
RegionCount       : 6
RequestCount      : 55
ReadRequestCount  : 26
WriteRequestCount : 14
MissCount         : 32
IncomingBandwidth : 96924
OutgoingBandwidth : 880

Having illustrated key elements on configuring individual caches, let’s move on to configuring cache hosts.

Cache Host Configuration Details

To retrieve configuration details for cache hosts, run the following command, which retrieves information about each host currently in the cluster:

PS:> Get-AFCacheHostStatus | % {
    $Status = $_.Status
    Get-AFCacheHostConfiguration -ComputerName $_.HostName -CachePort $_.PortNo |
        Add-Member -MemberType NoteProperty -Name ‘Status’ -Value $Status -PassThru
} | Format-List -Property *

I start this command with Get-AFCacheHostStatus since it returns all hosts in the cluster without further parameters necessary, unlike Get-AFCacheHostConfiguration. Note however that Get-AFCacheHostStatus attempts to ping each host in the cluster in order to report on status, and the timeout for this ping is 10 seconds. For a faster version of this command, at least for SharePoint servers, try this:

PS:> $SPDCServers = Get-SPServer | ? {($_.ServiceInstances | % TypeName) -contains ‘Distributed Cache’} | % Address
PS:> $SPDCServers | % {Get-AFCacheHostConfiguration -ComputerName $_ -CachePort 22233}

Of course, you won’t get a Status without Get-AFCacheHostStatus.

Output from the first command will look like the following:

Status          : Up
HostName        : SERVER09.gavant.local
ClusterPort     : 22234
CachePort       : 22233
ArbitrationPort : 22235
ReplicationPort : 22236
Size            : 600
ServiceName     : AppFabricCachingService
HighWatermark   : 99
LowWatermark    : 90
IsLeadHost      : True

Let’s describe the meaning of each of these properties:

  • Status: If the host responds to a standard ICMP ping within 10 seconds, this reports the status of the AppFabric service on that server. May be: {Up, Down, Starting, Stopping, ShuttingDown, Unknown}. This is the output from Get-AFCacheHostStatus.
  • HostName and ServiceName: Host and service name.
  • CachePort: The main port for public (external) communication with the cache host and cluster. Must be open to incoming client traffic.
  • ClusterPort, ArbitrationPort, and ReplicationPort: Used for internal data management communication amongst the hosts in the cluster. Must be open between servers.
  • Size: The amount of memory in MB to be allocated for live cached items. Note that actual memory used by the process will be significantly greater than this amount, and will be discussed later.
  • LowWatermark: Percentage of memory usage (from Size) when *expired* items are removed (evicted) from cache if expiration is enabled.
  • HighWatermark: Percentage of memory usage (from Size) when *all* items may be removed (evicted) from cache if eviction is enabled.
  • IsLeadHost: Whether this host is a lead host for cluster management. Not used in SharePoint.

Now that we’ve briefly discussed each of these properties, let’s dive deeper into their implications.

Eviction, Expiration, and Watermarks

The time has come to explain eviction, expiration, and watermarks. In a nutshell, the goal of the AppFabric service is to maintain the memory a cache host uses to store cached items between the low watermark and high watermark configured for that cache host. Every one second (by default), current memory usage for the host is checked against the Size and computed high and low watermark values for the host; based on the results, the following memory management algorithm is implemented.

  • Low Watermark not yet reached. No items are removed from the cache, even if expired.
  • Low Watermark reached, High Watermark not reached. Expired items are evicted, but non-expired items are not.
  • High Watermark reached. Expired and non-expired items are evicted until low watermark is reached.

This is shown graphically here:

If and when less than 15% (by default) of server memory remains, an eviction run is initiated regardless of the local cache host’s watermark and size settings. That is, even though the host is not using all of its allowed memory, if available memory on the server is below 15% of the total physical memory, a full eviction run will begin, as if the high watermark had been passed.

Note that caches specify whether they will be subject to expiration and eviction via the IsExpirable and EvictionType properties, as discussed in the previous section. If either of these excludes the cache from eviction, corresponding cached items will not be removed. For example, if IsExpirable is True and EvictionType is None (as it is for the Activity Feed LMT cache in SharePoint), expired items will be removed once the cache reaches its low watermark, but non-expired items will never be removed. As a result, if no items in the cache are expired, nothing will be removed.

If throttling is enabled (see below) the cache would eventually be write-throttled and no further items would be added till some items expired or were removed. However, throttling is not enabled by default in AppFabric for Windows Server and *I believe* that in this case the cache will continue to grow beyond its allotted size, governed only by the algorithm described above. Test and consider this when planning your own cache and host configurations.

Cache Statistics

Since the current amount of memory in use by the cache is so important, you’ll be interested in the commands which return a snapshot of current usage. As before, you have a couple options for retrieving information about all servers in the cluster, one using the Get-AFCacheHostStatus to iterate through all hosts, and one by finding all configured service instances in SharePoint. They are as follows:

PS:> Get-AFCacheHostStatus | % {
    $ServerName = $_.HostName
    Get-AFCacheStatistics -ComputerName $_.HostName -CachePort $_.PortNo | Add-Member -MemberType NoteProperty -Name ‘ServerName’ -Value $ServerName -PassThru
} | Format-List -Property *

PS:> $SPDCServers = Get-SPServer | ? {($_.ServiceInstances | % TypeName) -contains ‘Distributed Cache’} | % Address
PS:> $SPDCServers | % {
    $ServerName = $_
    Get-AFCacheStatistics -ComputerName $_ -CachePort 22233 | Add-Member -MemberType NoteProperty -Name ‘ServerName’ -Value $ServerName -PassThru

And typical output looks like this:

ServerName      : SERVER09
Size            : 51200
ItemCount       : 18
RegionCount     : 13
NamedCacheCount : 10
RequestCount    : 393
MissCount       : 66

Here, the SERVER09 cache host has allocated 51200 bytes (about 51K) to cached items. Since the allowed size for this cache host is much higher than 51K, neither expiration or eviction is necessary on this server, assuming at least 15% of the server’s physical memory is free.


According the this MSDN article, AppFabric for Windows Server is subject to throttling based on percentage of server memory in use and percentage of AppFabric service memory in use. However, in my investigations I’ve found that throttling is disabled by default in AppFabric for Windows Server, and SharePoint does not change the defaults. This means that even though no memory from the cache host’s allowed size remains, or even if no memory on the entire server remains, the system will continue to try to serve requests and allocate memory for cached items.

I’ll continue to investigate Throttling and update this section if I find other information. Please let me know if your testing reveals different behavior than I’ve described.

With all this discussion of memory algorithms and calculations, you won’t be surprised to learn that memory overcommitment schemes employed by virtualization hypervisors, such as Hyper-V’s dynamic memory, are not supported with SharePoint, and not recommended with AppFabric in general.

SharePoint Cache Service Details

Now that we’re explained most aspects of AppFabric cache and cache host configuration, let’s explore some of the defaults used for SharePoint cache hosts, and some recommendations.

Cache Host Size for SharePoint Hosts

There are two points during setup of a Distributed Cache Service Instance on a SharePoint server when the local cache host configuration comes into play: at service installation (e.g. during Farm Join or when running Add-SPDistributedCacheServiceInstance) and at service provisioning (e.g. immediately following service installation, or when calling Start-SPServiceInstance).

At service installation time, the Size property for the local cache host is set to 5% of the total physical memory of the host. For example, if 16GB of physical RAM are installed on the host at installation time, the size of the local cache host will be set to 800MB. Note that this value could be different on each SharePoint server if they have different amounts of physical RAM at installation time. Note also that this value won’t automatically change if the amount of physical RAM allocated to the server changes.

At service provisioning time, SharePoint checks that the amount of available physical memory in the server is at least 100MB more than the allowed size for the cache host. So if, as in the above example, the cache host size is set to 800MB, at least 900MB of physical RAM must be available at provisioning time, or the service will fail to start.

These default values may not be appropriate for your environment. In the next section we’ll discuss options for changing them.

Changing Cache Host Size for SharePoint

There are two options for modifying cache host size for SharePoint AppFabric servers. Both require shutting down the entire cluster (all cache hosts).

The first utilizes AppFabric’s own PowerShell cmdlets. First, run Stop-AFCacheCluster to shut down all hosts in the cluster, then on each cache host run Set-AFCacheHostConfiguration -CacheSize <NewSizeInMB> to specify a new cache size. Don’t forget to run Start-AFCacheCluster to restart all hosts in the cluster. You can also specify different high and low watermarks with the Set-AFCacheHostConfiguration cmdlet.

Advantages of the native AppFabric cmdlet approach are that 1) all hosts in the cluster are stopped immediately (not gracefully), 2) you can specify different cache sizes for each host, and 3) you can configure low and high watermarks if necessary.

The second approach is to use SharePoint’s Update-SPDistributedCacheSize cmdlet. This takes only one parameter, -SizeInMB. It shuts down all hosts in the cache cluster, updates the cache size for each of them, then restarts them all.

Disadvantages of the SharePoint cmdlet are that it 1) shuts down all but the last host in the cluster gracefully and 2) sets all cache hosts to the same size. Graceful shutdown takes much longer than immediate shutdown, since all cached items must first be moved to a different host. Yet since the entire cluster is to be shut down, graceful shutdown is not helpful here.

Hopefully, Microsoft will address the graceful shutdown issue in the future, and setting all servers to the same memory size may be appropriate in your farm. If these items aren’t concerns for you, I’d recommend using the SharePoint-specific cmdlet, as this will always command more respect if and when you need support.

Whether you use AppFabric or SharePoint cmdlets to modify cache host size, note that if you uninstall and reinstall the Distributed Cache Service Instance on a server (i.e. by running Remove-SPDistributedCacheServiceInstance and then Add-SPDistributedCacheServiceInstance) the cache host size will be reset to the default (5% of physical memory at time of installation). If removing and adding the cache service instance is part of your maintenance cycles, make sure to also modify the cache size afterwards if needed.

We’ve discussed many concepts relevant to AppFabric infrastructure and planning. Now let’s focus on a couple specific applied details.

Planning Server Memory and Cache Host Size

A key detail in AppFabric Cache Service planning is that the actual memory usage of the DistributedCacheService process will be significantly larger than the size allocated in the cache host configuration. This MSDN article states that at least twice the amount of memory specified for the cache host will be used by the process due to memory management algorithms. Run this command for a quick review of how much memory the process is actually consuming:

PS:> Get-Process DistributedCacheService | fl *64*

At the time I ran this command, Get-AFCacheStatistics reported my cache host size as 9216 bytes. The process’s Working Set, however, was 584118272 bytes (almost 600MB). This is of course much more than twice the used amount of RAM; my assumption is that there is always a base level of memory overhead no matter how small the actual caches may be.

The key takeaway here is that if you have a certain amount of memory to allocate for AppFabric, the size specified for the host configuration should be half of that. For example, if I intend to allocate 16GB of physical memory for AppFabric, the size specified in the host configuration should be 8GB.

Also in that article, it’s recommended not to allocate more than 16GB for the AppFabric server (and corresponding 8GB for the cache host configuration). If the cache host’s size is larger than 16GB/8GB, garbage collection could take long enough to cause a noticeable interruption for clients.

A common recommendation is to spec AppFabric servers with 16GB of physical RAM, and set the cache host size to 7GB. With this arrangement, you can expect about 14GB to be used by the AppFabric process, leaving 2GB for other server processes on the host.

Let’s wrap up by discussing a couple additional considerations specifically relevant to SharePoint’s AppFabric implementation.

Other Considerations for AppFabric in SharePoint

High Availability

This one is easy – SharePoint (as ofMarch 2013) does not provide any high availability for its caches. As briefly discussed above, this means that each item and region in SharePoint’s named caches exists only once across all the memory in the cluster. If the server where that item has been stored in memory is lost or shut down ungracefully, that cached item will be lost. As discussed at the very beginning of this post, this is generally not a problem for cached items because they are authoritatively stored elsewhere. Nevertheless, there are a couple things to keep in mind.

First, retrieving cached items all over again involves a performance hit, the very hit the caches are intended to help avoid. There could be interruptions and delays while the caches are being refilled. For example, if the ActivityFeed cache is lost, users may not see all recent updates in their Newsfeed, or may see the “We’re still gathering the news” message as the cache is repopulated.

For the ActivityFeed and ActivityFeedLMT cache, there are two PowerShell cmdlets to manually begin repopulation of the caches before users actually request data. These are Update-SPRepopulateMicroblogLMTCache and Update-SPRepopulateMicroblogFeedCache. In situations where maintenance leads to loss of these caches, plan to run these cmdlets immediately afterwards to repopulate data manually.

A second concern when cached data in SharePoint is lost is that some items in SharePoint are *only* stored in the cache; specifically, updates regarding followed documents are only stored in the cache (as of March 2013). If these cached items are lost they won’t be able to be regenerated and will no longer appear in users’ feeds.

To avoid losing items from the cache and/or having to retrieve them again, you can use the Stop-SPDistributedCacheServiceInstance cmdlet with the -Graceful switch. This will move all cached items from the local cache host to other cache hosts in the cluster. For this to be effective, there must be space on the other servers to accommodate these items. Also note that if shutting down the entire cluster, such as to change the cache host size, there’s no way to avoid losing all of the caches and items. Plan accordingly.

Caches in SharePoint’s Deployment

One last detail is that Microsoft has stated that additional named caches should not be deployed to the SharePoint AppFabric cluster (i.e. by using the New-AFCache cmdlet). If you need a cache for a custom solution, you’ll need to deploy a separate AppFabric cluster (or server) and create the cache there. Then point your solution at the external AppFabric cluster. There’s also no supported way to add your own cached items to SharePoint’s named caches.


This concludes our presentation on AppFabric and SharePoint’s Distributed Cache Service. I hope it provides both SharePoint and AppFabric administrators with a deeper understanding and greater ability to manage distributed cache clusters.

Migration from File Shares to Document Management (SharePoint 2013) using OneDrive (Skydrive) for Business

Some tips  How to migrate file shares to SharePoint and use OneDrive (SkyDrive) for Business (ODFB) and if you are planning to migrate file share content into SharePoint and want to make use of ODFB for synchronizing the SharePoint content offline.

Note: that these steps are both valid for SharePoint 2013 on-premises and SharePoint Online (SPO).

Info about SharePoint Limits

First Step  – Analyze your File Shares

As a first step, try to understand the data that resides on the file shares. Ask yourself the following questions:

  • What is the total size of the file share data that the customer wants to migrate?
  • How many files are there in total?
  • What are the largest file sizes?
  • How deep are the folder structures nested?
  • Is there any content that is not being used anymore?
  • What file types are there?

Let me try to explain why you should ask yourself these questions.

Total Size

If the total size of the file shares are more that the storage capacity that you have on SharePoint, you need to buy additional storage (SPO) or increase your disk capacity (on-prem). To determine how much storage you will have in SPO, please check the Total available tenant storage in the tables in this article. Another issues that may arise is that in SharePoint is that you reach the capacity per site collection. For SPO that is 1000 Gigabyte (changed from 100 GB to 1 TB), for on-premises the recommended size per site collection is still around 200 Gigabyte.

What if we have more than 1000 Gigabyte?

  • Try to divide the file share content over multiple site collections when it concerns content which needs to be shared with others.
  • If certain content is just for personal use, try to migrate that specific content into the personal site of the user.

How Many Files

The total amount of files on the file shares is important as there are some limits in both SharePoint as well as ODFB that can result in an unusable state of the library or list within SharePoint but you also might end up with missing files when using the ODFB client.

First, in SPO we have a fixed limit of 5000 items per view, folder or query. Reasoning behind this 5000 limit boils all the way down to how SQL works under the hood. If you would like to know more about it, please read this article. In on-prem there is a way to boost this up, but it is not something we recommend as the performance can significantly decrease when you increase this limit.

Secondly for ODFB there is also a 5000 limit for synchronizing team sites and 20000 for synchronizing personal sites. This means that if you have a document library that contains more that 5000 items, the rest of the items will not be synchronized locally.

There is also a limit of 5 million items within a document library, but I guess that most customer in SMB won’t reach that limit very easily.

What should I do if my data that I want to migrate to a document library contains more than 5000 items in one folder?

  • Try to divide that amount over multiple subfolders or create additional views that will limit the amount of documents displayed.

But wait! If I already have 5000 items in one folder, doesn’t that mean that the rest of the other document won’t get synchronized when I use ODFB?

Yes, that is correct. So if you would like to use ODFB to synchronize document offline, make sure that the total amount of documents per library in a team site, does not exceed 5000 documents in total.

How do I fix that limit ?

  • Look at the folder structure of the file share content and see if you can divide that data across multiple sites and/or libraries. So if there is a folder marketing for example, it might make more sense to migrate that data into a separate site anyway, as this department probably wants to store additional information besides just documents (e.g. calendar, general info about the marketing team, site mailbox etc). An additional benefit of spreading the data over multiple sites/libraries is that it will give the ODFB users more granularity about what data they can take offline using ODFB. If you would migrate everything into one big document library (not recommended), it would mean that all users will need to synchronize everything which can have a severe impact on your network bandwidth.

Largest File Sizes

Another limit that exists in SPO and on-prem is the maximum file size. For both the maximum size per file is 2 Gigabyte. In on-prem the default is 250 MB, but can be increased to a maximum of 2 Gigabyte.

So, what if I have files that exceed this size?

  • Well, it won’t fit in SharePoint, so you can’t migrate these. So, see what type of files they are and determine what they are used for in the organization. Examples could be software distribution images, large media files, training courses or other materials. If these are still being used and not highly confidential, it is not a bad thing to keep these on alternative storage like a SAN, NAS or DVDs. If it concerns data that just needs to be kept for legal reasons and don’t require to be retrieved instantly, you might just put these on DVD or an external hard drive and store them in a safe for example.

Folder Structures

Another important aspect to look at on your file shares is the depth of nested folders and file length. The recommended total length of a URL in SharePoint is around 260 characters. You would think that 260 characters is pretty lengthy, but remember that URLs in SharePoint often has encoding applied to it, which takes up additional space. E.g. a space is one character but in Unicode this a %20, which takes up three characters. The problem is that you can run into issues when the URL becomes to large. More details about the exact limits can be found here, but as a best practice try to keep the URL length of a document under 260 characters.

What if I have files that will have more than 260 characters in total URL length?

  • Make sure you keep your site URLs short (the site title name can be long though). E.g. don’t call the URL Human Resources, but call it HR. If you land on the site, you would still see the full name Human Resources as Site Title and URL are separate things in SharePoint.
  • Shorten the document name (e.g. strip of …v.1.2, or …modified by Andre), as SharePoint has versioning build in. More information about versioning can be found here.

Content Idle

When migrating file shares into SharePoint is often also a good momentum to clean up some of the information that the organization has been collecting over the years. If you find there is a lot of content which is not been accessed for a couple of years, what would be the point of migrating that data it to SharePoint?

So, what should I do when I come across such content?

  • Discuss this with the customer and determine if it is really necessary to keep this data.
  • If the data cannot be purged, you might consider storing it on a DVD or external hard drive and keep it in a safe.
  • If the content has multiple versions, such as proposal 1.0.docx, proposal 1.1.docx, proposal final.docx, proposal modified by Andre.docx, you might consider just moving the latest version instead of migrating them all. This manual process might be time consuming, but can safe you lots of storage space in SharePoint. Versioning is also something that is build into the SharePoint system and is optimized to store multiple versions of the same document. For example, SharePoint only stores the delta of the next version, saving more storage space that way. This functionality is called Shredded Storage.

Types of Files

Determine what kind of files the customer is having. Are they mainly Office documents? If so, then SharePoint is the best place to store such content. However, if you come across developers code for example, it is not a good idea to move that into SharePoint. There are also other file extensions that are not allowed in SPO and/or on-prem. A complete list of blocked file types for both SPO and on-prem can be found here.

what if I come across such blocked file extensions?

  • Well, you can’t move them into SharePoint, so you should either ask yourself, do I still need these files? And if so, is there an alternative storage facility such as a NAS, I can store these files on? If it concerns developer code, you might want to store such code on a Team Foundation Service Server instead.

Tools for analyzing and fixing file share data

In order to determine if you have large files or exceed the 5000 limit for example, you need to have some kind of tooling. There are a couple of approaches here.

  • There is a PowerShell script that has been pimped up by  Hans Brender, which checks for blocked file types, bad characters in files and folders and finally for the maximum URL length. The script will even allow you to fix invalid characters and file extensions for you. It is a great script, but requires you to have some knowledge about PowerShell. Another alternative I was pointed at is a tool called SharePrep. This tool does a scan for URL length and invalid characters.
  • There are other 3rd party tools that can do a scan of your file share content such as Treesize. However such tools do not necessarily check for the SharePoint limitations we talked about in the earlier paragraphs, but at least they will give you a lot more insight about the size of the file share content.
  • Finally there are actual 3rd party migration tools that will move the file share content into SharePoint, but will check for invalid characters, extensions and URL length upfront. We will dig into these tools in Step 2 – Migrating your data.

Second Step – Migrating your data

So, now that we have analyzed our file share content, it is time to move them into SharePoint. There are a couple of approaches here.

Document Library Open with Explorer

If you are in a document library you can open up the library in the Windows Explorer. That way you can just do a copy and paste from the files into SharePoint.


Are some drawbacks using this scenario. First of all, I’ve seen lots of issues trying to open up the library in the Windows Explorer. Secondly, the technology that is used for copying the data into SharePoint is not very reliable, so keep that in mind when copying larger chunks of data. Finally there is also drag & drop you can use, but this is only limited to files (no folders) and only does a maximum of 100 files per drag. So this would mean if you have 1000 files, you need to drag them 10 times in 10 chunks. More information can be found in this article. Checking for invalid characters, extensions and URL length upfront are also not addressed when using the Open with Explorer method.

Pros: Free, easy to use, works fine for smaller amounts of data

Cons: Not always reliable, no metadata preservations, no detection upfront for things like invalid characters, file type restrictions, path lengths etc.

OneDrive (formerly SkyDrive) for Business

You could also use ODFB to upload the data into a library. This is fine as long as you don’t sync more than 5000 items per library. Remember though that ODFB is not a migration tool, but a sync tool, so it is not optimized for large chunks of data to be copied into SharePoint. Things like character and file type restrictions, path length etc. is on the list of the ODFB team to address, but they are currently not there.

The main drawbacks of using either the Open in Explorer option or using ODFB is that when you use these tools, they don’t preserve the metadata of the files and folder that are on the file shares. By this I mean, things like the modified date or owner field are not migrated into SharePoint. The owner will become the user that is copying the data and the modified date will be the timestamp of the when the copy operation was executed. So if this metadata on the files shares is important, don’t use any of the methods mentioned earlier, but use one of the third party tools below.

Pros: Free, easy to use, works fine for smaller amounts of data (max 5000 per team site library or 20000 per personal site)

Cons: No metadata preservations, no detection upfront for things like invalid characters, file type restrictions, path lengths etc.

3rd party tools

Here are some of the 3rd party tools that will provide additional detection, fixing and migration capabilities that we mentioned earlier:

Where some have a focus on SMB, while other more focused on the enterprise segment. We can’t speak out any preference for one tool or the other, but most of the tools will have a free trial version available, so you can try them out yourself.


When should I use what approach?

Here is a short summary of capabilities:

Open in Explorer
OneDrive for Business (with latest update)
3rd party

Amount of data
Relatively small
No more than 5000 items per library
Larger data sets

Invalid character detection
Mostly yes1

URL length detection
Mostly yes1

Metadata preservation
Mostly yes1

Blocked file types detection
Mostly yes1

1This depends on the capabilities of the 3rd party tool.


ODFB gives me issues when synchronizing data
Please check if you have the latest version of ODFB installed. There have been stability issues in earlier released builds of the tool, but most of the issues should be fixed by now. You can check if you are running the latest version, by opening up Word-> File-> Account and click on Update Options-> View Updates. If your current version number is lower than the one you have, click on the Disable Updates button (click yes if prompted), then click Enable updates (click yes if prompted). This will force downloading the latest version of Office and thus the latest version of the ODFB tool.


If you are running the stand-alone version of ODFB, make sure you have downloaded the latest version from here.

Latency Time by the upload , process is taking so long?
This really depends on a lot of things. It can depend on:

  • The method or tool that is used to upload the data
  • The available bandwidth for uploading the data. Tips:
  • Check your upload speed at and do a test for your nearest Office 365 data center. This will give you an indication of the maximum upload speed.
  • Often companies have less available upload bandwidth then people at home. If you have the chance, uploading from a home location might be faster.
  • Schedule the upload at times when there is much more bandwidth for uploading the data (usually at night)
  • Test your upload speed upfront by uploading maybe 1% of the data. Multiply it by 100 and you have a rough estimate of the total upload time.
  • The computers used for uploading the data. A slow laptop can become a bottle neck while uploading the data.

New Versions on TFS Guidance are available (TFS Planning, DR avoidance and TFS on Azure IaaS)

what’s new?

Now we include practical, scenario-based guidance for the implementation of Team Foundation Server (TFS) on Azure Infrastructure as a Service (IaaS). Also We guide you through the planning and setup, based on a real-world proof-of-concept production deployment and experience from the ALM Rangers “in-the-field”.


here’s the stuff?


BETA, why ?

The guides are feature and content complete, technically reviewed and therefore consumable as a BETA.

.Net Framework Collections Performance and The right Choice

Up NET Framework 4 introduces new data structures designed to simplify thread-safe access to shared data, and to increase the performance and scalability of multi-threaded applications. To best take advantage of these data structures, it helps to understand their performance characteristics in different scenarios.

Some of the Parallel Computing Team members (Emad Omara and Mike Liddell) measured the performance of four new concurrent collection types: ConcurrentQueue(T), ConcurrentStack(T), ConcurrentBag(T), and ConcurrentDictionary(TKey, TValue), and published their results on the Parallel Computing Developer Center on MSDN at

Concluding some of their findings like follows :



Contiguous Storage?

Direct Access?

Lookup Efficiency






Via Key




SortedDictionary Sorted No Via Key Key: 
O(log n)
O(log n)




Via Key


O(log n)



User has precise control over element ordering


Via Index

Index: O(1)

Value: O(n)



User has precise control over element ordering









Via Key







Via Key


O(log n)

O(log n)




Only Top

Top: O(1)





Only Front

Front: O(1)


1.Uses the implementation details in the data structures that relate to performance, although these details are subject to change in future releases.

2. to provide performance measurements that compare the data structures to alternatives and to measure the scalability of certain scenarios for different numbers of threads/cores

3. to provide best-practice guidance that will help answer questions such as “when should I use the new thread-safe collections,” and “what aspects of a scenario make a given type more scalable or better performing than others?”

Throughout the document, our best-practice guidance is called out in boxes like the following:

Use the new functionality in .NET Framework 4 to get the most out of your multi-core machines

First, we explain the performance measurements and the scenarios used for the majority of our analyses. We then discuss and analyze the performance of BlockingCollection(T), ConcurrentStack(T), ConcurrentQueue(T) , ConcurrentBag(T), and ConcurrentDictionary(T).

Our tests were run on a specific set of machines for which the configurations are described in the appendix. Our performance analyses were based primarily on the statistics of test completion time. We expect that the completion time will vary between runs and if the tests are run on different hardware. For this reason, the test results provided are to be used purely as a starting point of performance tuning for an application.

Performance Criteria

Thread-safe collections in .NET Framework 4 and their performance characteristics Page 1 We measured the performance of the thread-safe collections to answer two questions:

1. For an algorithm using one of the thread-safe collections on a machine with at least N cores, how much faster does it execute when using N threads as compared to just using 1 thread? This measurement is called scalability.

2. How much faster is an algorithm – if it utilizes one of the new thread-safe collections, as opposed to an equivalent algorithm that doesn’t use one of the new types? This measurement is called speedup.

The scalability investigations measured how the performance of a thread-safe collection varied as more cores were utilized.The aim of the speedup investigations were quite different as they compare two ways of solving a problem using different approaches. In these investigations we typically used the same machine configuration and ensured the programs were unchanged except for the replacement of a thread-safe collection with an alternative that was equivalent in functionality. Each speedup experiment defined a single algorithm that required a thread-safe collection to operate correctly on multi-core machines. The experiments were run using one of the new threadsafe collections and, again, were compared against a simple implementation of the same data structure. The simple implementations are straightforward approaches built using types available prior to .NET Framework 4 – in most cases, they are a synchronization wrapper around a non-thread-safe collection. For example, the following SynchronizedDictionary(TKey, TValue) was compared to ConcurrentDictionary(TKey, TValue):

public class SynchronizedDictionary<TKey, TValue> : IDictionary<TKey, TValue>
private Dictionary<TKey, TValue> m_Dictionary = new Dictionary<TKey, TValue>(); private object _sync = new object();
public void Add(TKey key, TValue value)
lock (_sync)
m_Dictionary.Add(key, value);
// and so on for the other operations..

Producer-Consumer Scenarios Used for Speedup Comparisons

For our experiments, we created scenarios that we feel represent common usage. The most common scenarios that apply to the new collections are variations of the producer-consumer pattern (Stephen Toub, 2009). In particular, ConcurrentStack(T), ConcurrentQueue(T), and ConcurrentBag(T) are often appropriate for use as a buffer between producers and consumers. BlockingCollection(T) simply supports the blocking and bounding requirements of some producer-consumer scenarios.

In particular, we used two producer-consumer scenarios: a pure scenario and a mixed scenario.

In the pure scenario, we created N threads on an N-core machine where N/2 threads only produced data and N/2 threads only consumed data. This basic scenario appears whenever generation of work items is logically separate to the processing of work items. For simplicity, in this scenario we assume equal number of producers and consumers and that they process each item with same speed. However in real world it is common that producers and consumers are not balanced.

The main loop for a producer thread typically looks like:

while (producingcondition())
var item = DoDummyWork(workload); //simulates work to create the item

The main loop for a consumer thread typically looks like :

while (consumingcondition())
TItem item;
//TryTake returns true if item is removed successfully by the current
//thread; false otherwise if (collection.TryTake(out item))
DoDummyWork(workload); //simulates work to process the item
//spin, wait, or do some other operations.

The mixed scenario represents situations where threads both produce and consume data. Consider a generic tree traversal algorithm over a family tree where each node is defined as:

public class Person
{ public string name; public int age;
public List<Person> children;

To traverse the tree, we can make use of any IProducerConsumerCollection(T) to act as a holding collection for subsequent nodes that must be traversed. The following code is a multi-threaded Traverse() method.

private static void Traverse(Person root,

IProducerConsumerCollection<Person> collection)
Task[] tasks = new Task[dop]; //typically dop=n for n-core machine int activeThreadsNumber = 0;
for (int i = 0; i < tasks.Length; i++)
tasks[i] = Task.Factory.StartNew(() =>
bool lastTimeMarkedFinished = false;
Interlocked.Increment(ref activeThreadsNumber);
while (true)
if (lastTimeMarkedFinished)
Interlocked.Increment(ref activeThreadsNumber); lastTimeMarkedFinished = false;
Person parent = null;
if (collection.TryTake(out parent))
foreach (Person child in parent.Children)
DoDummyWork(workload); //processing per node
} else {
if (!lastTimeMarkedFinished)
Interlocked.Decrement(ref activeThreadsNumber); lastTimeMarkedFinished = true;
if (activeThreadsNumber == 0) //all tasks finished
{ return;

In this program, each thread acts as both a producer and a consumer, adding and taking items from the shared collection.

In both the pure and mixed scenarios we simulated the work required to produce and consume items. We did this via a simple DoDummyWork(int k) function that repeats a simple floating point calculations k times. The exact details are not important but we can assume that each loop of the dummy function corresponds to a handful of simple machine instructions. Since we were interested in measuring the costs associated with accessing the data structures, we have typically used very small work functions.


BlockingCollection(T) provides blocking and bounding semantics for any IProducerConsumerCollection(T) type. In this way, BlockingCollection(T) can add and remove elements using any underlying policy implemented by the backing collection while providing the mechanism to block on attempts to remove from an empty store and to block on attempts to add if a bound is specified. Prior to .NET Framework 4, this would most likely have been constructed using a Monitor with a collection type like Queue(T), as the following NaiveBlockingQueue(T):

class NaiveBlockingQueue<T>
Queue<T> m_queue; object m_lockObj;
public NaiveBlockingQueue()
m_queue = new Queue<T>(); m_lockObj = new object();
public void Add(T item)
lock (m_lockObj)
m_queue.Enqueue(item); Monitor.Pulse(m_lockObj);
public void Take(out T item)
lock (m_lockObj)
while (m_queue.Count == 0) Monitor.Wait(m_lockObj); item = m_queue.Dequeue();

This NaiveBlockingQueue(T) blocks Take operations if the underlying queue is empty, it does not support rich functionalities such as bounding for additions, waking readers once all production has finished, cancellation and pluggable underlying collections. BlockingCollection(T) supports all of these.

NaiveBlockingCollection(T) uses a simple Monitor while BlockingCollection(T) internally uses more complex synchronization methods. From the performance point of view, NaiveBlockingCollection(T) could perform better in scenarios where workload is zero or very small. But once more functionalities are added to

NaiveBlockingCollection(T), the performance starts to degrade with more overhead of synchronization.

To compare the performance of BlockingCollection(T) and NaiveBlockingQueue(T), we ran a test in which one thread produces all the elements and all the other threads concurrently consume these elements. We ran this test using 2, 4 and 8 threads with various workloads. With zero workload, BlockingCollection(T) performed worse than NaiveBlockingQueue(T) as expected, but as workload increased BlockingCollection(T) started to outperform NaiveBlockingQueue(T). We found the tipping point of the workload was 500 FLOPs in our test configuration, and this could vary according to different hardware. Figure 1 shows the elapsed time of this test running on an 8-core machine, for a variety of thread counts.


Number of Threads

Figure 1: Comparison of scalability for BlockingCollection(T) and NaiveBlockingQueue(T) in the scenario with 1 producer and N-1 consumers and workload of 500FLOPs

The real power of the BlockingCollection(T) type, however, is that the rich functionalities it supports we mentioned. Given that BlockingCollection(T) performs similar with or better than BlockingQueue(T) and that it has much richer functionality, it is appropriate to use BlockingCollection(T) whenever blocking and bounding semantics are required.

When blocking and bounding semantics are required, BlockingCollection(T) provides both rich functionality and good performance.

Other Considerations

Given a collection type, we may also be interested to get a count of items, to enumerate all the data, or to dump the data into another data structure. When using a BlockingCollection(T), it is useful to know the following performance characteristics.

The Count method relies on the synchronization mechanism of BlockingCollection(T), specifically the CurrentCount property of the SemaphoreSlim object it keeps internally. Thus Count is an O(1) operation, and it reflects the real count of the underlying collection, unless the underlying collection is modified outside the BlockingCollection(T), which is a bad practice that breaks the contract supported by BlockingCollection(T), and we advise against it.

BlockingCollection(T) also provides a GetConsumingEnumerable() method for the consumer to enumerate the collection, which, unlike normal enumerables, mutates the collection by repeatedly calling Take(). This means that calling MoveNext() on this enumerator will block if the collection is empty and wait for the new items to be added into the collection. To safely stop enumerating, you can call CompleteAdding() or cancel the enumeration using the GetConsumingEnumerable(CancellationToken canellationToken) overload.

The GetEnumerator() method calls the underlying collection’s GetEnumerator() method, so its performance depends on the implementation of the underlying collection. The ToArray() method of BlockingCollection(T) wraps calls to the corresponding methods of the underlying collection with minimal overhead, so its performance also depends on the underlying collection.


ConcurrentQueue(T) is a data structure in .NET Framework 4 that provides thread-safe access to FIFO (First-In First-Out) ordered elements. Under the hood, ConcurrentQueue(T) is implemented using a list of small arrays and lock-free operations on the head and tail arrays, hence it is quite different than Queue(T) which is backed by an array and relies on the external use of monitors to provide synchronization. ConcurrentQueue(T) is certainly more safe and convenient than manual locking of a Queue(T) but some experimentation is required to determine the relative performance of the two schemes. In the remainder of this section, we will refer to a manually locked Queue(T) as a self-contained type called SynchronizedQueue(T).

Pure Producer-Consumer Scenario

The experiments we used for ConcurrentQueue(T) follow the producer-consumer patterns discussed earlier and we paid particular attention to the use of the simulated workload functions. The first experiment was a pure producer-consumer scenario where half of the threads were producers which simulated work by looping the simulated workload function then added an item to the queue; the other half were consumers which did the same simulation work but were instead removing items. The tests were run for various thread-counts and for differing workload sizes. We defined two workload sizes: the first is 0 FLOPS and the second is 500 FLOPS for both the producer loops and the consumer loops. These workloads are representative of workloads where contention would most likely be a dominating cost. For workloads that are significantly large, synchronization costs are potentially negligible. The exact values of elapsed times in milliseconds are not important since they vary by number of operations executed in the tests. Instead, we are interested in how elapsed time changes when a test runs on different numbers of threads, since it shows the scalability of this implementation.


Number of Threads

Figure 2: Comparison of scalability for ConcurrentQueue(T) and SynchronizedQueue(T) in a pure producer-consumer scenario with a zerocost workload function.


Number of Threads

Figure 3: Comparison of scalability for ConcurrentQueue(T) and SynchronizedQueue(T) in a pure producer-consumer scenario with a 500 FLOPS workload function.

Figures 2 and 3 show the elapsed time for a pure producer-consumer scenario implemented using ConcurrentQueue(T) and SynchronizedQueue(T) with the two different workloads.

In Figure 2, we see that when the workload was very small, both ConcurrentQueue(T) and SynchronizedQueue(T) achieved their best performance when using exactly two threads and also that ConcurrentQueue(T) performed better for the two-thread case. The lack of scalability past two threads for both queues is expected as ConcurrentQueue(T) has only two access points (the head and the tail) and SynchronizedQueue(T) has only one access point because head and tail operations are serialized. So, at most, two threads can operate with little contention but more threads will necessarily suffer contention and synchronization overheads that will dominate in the absence of a significant workload function.

For scenarios with very light computation it is best to use ConcurrentQueue(T) on two threads: one pure producer, and the other pure consumer. Queues will not scale well beyond two threads for such scenarios.

On the other hand, in Figure 3 we see that ConcurrentQueue(T) does scale beyond two threads for workloads of 500 FLOPS due to minimal synchronization overhead. SynchronizedQueue(T), however, does not scale for workloads of 500 FLOPS as its costs of synchronization are significantly higher and continue to be a significant factor. We found that 500 FLOPS is representative of the largest workload that shows a difference in scalability given the hardware our tests ran on. For larger workloads, the scalability of ConcurrentQueue(T) and Synchronized(T) do not differ greatly.

For scenarios involving moderate-size work functions (such as a few hundred FLOPS) , ConcurrentQueue(T) can provide substantially better scalability than SynchronizedQueue(T).

If you have a small workload that falls in between 0 and 500 FLOPS, experimentation will best determine whether using more than two threads is beneficial.

Mixed Producer-Consumer Scenario

The second experiment for queues used the tree traversal scenario, in which each thread was both a producer and a consumer. Figure 4 and 5 show the results for this scenario.


Figure 4: Comparison of scalability for ConcurrentQueue(T) and SynchronizedQueue(T) in a mixed producer-consumer scenario with a zero cost workload function.


Number of Threads

Figure 5: Comparison of scalability for ConcurrentQueue(T) and SynchronizedQueue(T) in a mixed producer-consumer scenario with a 500 FLOPs workload function.

From Figure 4, we see that neither data structure showed any scalability when the work function was very small.The loss of scalability for even the two-thread case was due precisely to this being a mixed producerconsumer scenario. The two threads were both performing operations on the head and the tail thus introducing contention that is not present in the pure scenario. ConcurrentQueue(T) performed worse than

SynchronizedQueue(T) as we increase the number of threads in execution. This is due to the fact that the ConcurrentQueue(T) implementation uses compare-and-swap (CAS) primitives which rely on spinning to gain entry to critical resources (see the MSDN article on Interlocked Operations: When contentious requests are as frequent as they are in this case, CAS primitives do not perform as well as locks, like those used in SynchronizedQueue(T).

However, as shown in Figure 5, when the work function is a few hundred FLOPS or larger, the mixed scenario showed scalability. In these situations, ConcurrentQueue(T) has lower overheads and thus shows significantly better performance that is amplified as the number of threads/cores increases.

For mixed producer-consumer scenarios, scalability is only available if the work function is a few hundred FLOPS or larger. For these scenarios, ConcurrentQueue(T) provides significantly better performance than SynchronizedQueue(T).

Other Considerations

The Count, ToArray() and GetEnumerator() members take snapshots of the head and tail and, thus, the entire collection. Taking the snapshot is a lock-free operation and takes O(1) time on average, however, each of these members have their own additional costs.

Since the queue maintains an index for each item according to the order it is added to the queue, after the Count property takes the snapshot, it simply returns the result of substracting the head index from the tail index. Thus the Count property overall is O(1) on average.

After the ToArray() method takes the snapshot, it copies all the items into an array, thus it is overall an O(N) operation. GetEnumerator() delegates to ToArray() and returns the enumerator of the result array, thus it takes O(N) time to return, and provides an unchanging snapshot of items.


ConcurrentStack(T) is an implementation of the classic LIFO (Last-In First-Out) data structure that provides threadsafe access without the need for external synchronization. ConcurrentStack(T) is intended to be used in scenarios where multiple threads are managing a set of items and wish to process items in LIFO order. It is useful in scenarios for which new data should be processed in preference to processing older data, such as a multi-threaded depth-first search. Other examples arise in situations where there are penalties for not processing data on time. In such situations, the total penalties may be minimized by processing new items first, and allowing items that have already missed their schedule to be further delayed – if so, then a LIFO data structure for managing the items may be appropriate.

We compare the ConcurrentStack(T) to a simple implementation called SynchronizedStack(T) which is a thin wrapper around the non-thread-safe Stack(T) that uses a single monitor for synchronization.

Pure Producer-Consumer Scenario


ConcurrentStack workload=0

SynchronizedStack workload=0

Figure 6: Comparison of scalability for ConcurrentStack(T) and SynchronizedStack(T) in a pure producer-consumer scenario with a zero-cost workload function.


Number of Threads

Figure 7: Comparison of scalability for ConcurrentStack(T) and SynchronizedStack(T) in a pure producer-consumer scenario with a 500 FLOPS workload function.

Figures 6 and 7 show the results for the a pure producer-consumer scenario implemented using ConcurrentStack(T) and SynchronizedStack(T). The tests used here are identical to that used for ConcurrentQueue(T).

From the results, we see that ConcurrentStack(T) has largely identical performance characteristics to

SynchronizedStack(T). This is the result of both implementations having a single point of contention and the lack of opportunities for ConcurrentStack(T) to do anything particular to improve raw performance.

We also see in Figures 6 and 7 that a pure producer-consumer scenario involving stacks will only exhibit scaling if the workload is a few hundred FLOPS or larger. For smaller workloads, the scalability degrades until we see that no scalability is available when the workload is tiny.

For a pure producer-scenario scenario, ConcurrentStack(T) has essentially identical performance as a SynchronizedStack(T). Both show good scalability for work-functions that are a few hundred FLOPS or larger.

Even though the performance characteristics are identical for this scenario, we recommend using the ConcurrentStack(T) due to it being simple and safe to use.

Mixed Producer-Consumer Scenario

Figures 8 and 9 show the results for the tree traversal scenario implemented using ConcurrentStack(T) and SynchronizedStack(T).


Figure 8: Comparison of scalability for ConcurrentStack(T) and SynchronizedStack(T) in a mixed producer-consumer scenario with a zero-cost workload function.


Number of Threads

Figure 9: Comparison of scalability for ConcurrentStack(T) and SynchronizedStack(T) in a mixed producer-consumer scenario with a 500 FLOPS workload function.

In the tree traversal scenario, we actually see a divergence in the performance of the two implementations, and ConcurrentStack(T) has consistently better performance due to lower overheads when used in a mixed producerconsumer scenario.

For a mixed producer-scenario scenario, ConcurrentStack(T) has better performance than SynchronizedStack(T) and shows good better for work functions of a few hundred FLOPS or larger.

Other Considerations

In some scenarios, you may have multiple items to add or remove at a time. For instance, when the LIFO order is preferred but not strictly required, a thread may be able to process N items at a time rather than processing them one-by-one. In this case, if we call Push() or TryPop() repetitively for N times, there is a synchronization cost for each operation. The PushRange() and TryPopRange() methods that ConcurrentStack(T) provide only use a single CAS operation to push or pop N items and thus significantly reduce the total synchronization cost in these scenarios.

For a scenario in which many items can be added or removed at a time to process, use PushRange() and TryPopRange() methods.

It is worth mentioning that you can also implement PushRange() and TryPopRange() for SynchronizedStack(T), by taking the global lock to push or pop an array of items. It sometimes performs better than ConcurrentStack’s PushRange() and TryPopRange(). This is because Stack(T), at the core of SynchronizedStack(T), is based on arrays, while ConcurrentStack(T) is implemented using a linked list which comes with memory allocation cost with each node. Nevertheless, we recommend to use ConcurrentStack(T) because it provides out of box API support for all scenarios, thread safety and decent overall performance.

The Count, ToArray() and GetEnumerator() members are all lock-free and begin by taking an immutable snapshot of the stack.

The Count method walks the stack to count how many items are present, and is thus an O(N) operation. Whenever possible, avoid accessing the Count property in a loop. For example, since the IsEmpty property takes only O(1) time, you should always use

while (!stack.IsEmpty)
//do stuff
instead of
while (stack.Count > 0)
//do stuff

The ToArray() and GetEnumerator() methods take a snapshot and then process the items in the snapshot, so they are both O(N) time operations.


ConcurrentBag(T) is a new type for .NET Framework 4 that doesn’t have a direct counterpart in previous versions of .NET Framework. Items can be added and removed from a ConcurrentBag(T) as with ConcurrentQueue(T) and ConcurrentStack(T) or any other IProducerConsumerCollection types, but the items are not maintained in a specific order. This lack of ordering is acceptable in situations where the only requirement is that all data produced is eventually consumed. Any scenario that can use a bag could alternatively use an ordered data structure such as a stack or a queue but the ordering rules demand restrictions and synchronization that can hamper scalability.

ConcurrentBag(T) is built on top of the new System.Threading.ThreadLocal(T) type such that each thread accessing the ConcurrentBag(T) has a private thread-local list of items. As a result, adding and taking items can often be performed locally by a thread, with very little synchronization overhead. However, a ConcurrentBag(T) must present a global view of all the data so, if a thread tries to take an item but finds its local list is empty, it will steal an item from another thread if other threads own items. Since ConcurrentBag(T) has very low overheads when each thread both adds and removes items, we can immediately see that the ConcurrentBag(T) should be an excellent collection type for mixed producer-consumer scenarios if ordering isn’t a concern.

The graph traversal scenario is thus an ideal scenario for the ConcurrentBag(T) if the specific traversal ordering is not important. When the tree is balanced, there is a high probability that a thread that produces a node will also consume that node, so a signifcant percentage of TryTake() operations will be inexpensive removal operations from the thread’s lock list, as opposed to costly steal operations from other thread lists. The process starts by adding the root node to the ConcurrentBag(T) on the main thread then spinning up producer-consumer threads. One of the threads will steal the root node and produce child nodes to search. From here, the other threads will race to steal nodes and then commence searching in their own sub-trees. Once the process warms up, the threads should largely operate in isolation with little need to synchronize until we start to run out of nodes to traverse.


Figure 10: Visualization of graph-traversal using multiple-threads and a ConcurrentBag(T).

Figure 10 shows how a graph is traversed using a ConcurrentBag(T) and four threads. The black arrows show the nodes stolen by the threads and the node colors represent the thread used to traverse them.

The primary scenario we used to test ConcurrentBag(T) is an unordered graph-traversal where the work functions are string comparison (with a maximum of 10 characters in each string). The inner loop contains the following:

if (bag.TryTake(out node))
for (int i = 0; i < node.Children.Count; i++)
ProcessNode(node); //e.g. a short string comparison }

For comparison types, the main contenders are thread-safe ordered collections such as thread-safe queue, and thread-safe stack. We could use simple implementations of the thread-safe ordered collections, but we chose to test against the new collections: ConcurrentQueue(T) and ConcurrentStack(T). Figure 11 shows the result of the tree-search scenario for different tree sizes.


Number of Nodes in Tree

Figure 11: Comparison of ConcurrentBag(T), ConcurrentStack(T), and ConcurrentQueue(T)’s performance in a mixed producer-consumer scenario for various tree-sizes.

As expected, ConcurrentBag(T) dramatically outperformed other collections for this scenario and we can expect the results to generalize to other mixed producer-consumer scenarios.

For mixed producer-consumer scenarios that do not require item ordering, ConcurrentBag(T) can be dramatically more efficient than ConcurrentStack(T) , ConcurrentQueue(T), and other synchronized collections.

To measure the scalability of ConcurrentBag(T), we ran the same scenario for a tree size of 100,000 nodes and varied only the number of threads involved in the search.


Figure 12: Scalability of ConcurrentBag(T) in a mixed producer-consumer scenario.

Figure 12 demonstrates that the scalability of ConcurrentBag(T) is excellent even when the work functions are very small. As noted previously, for scenarios that involve larger work functions, we can expect the scalability to be even closer to linear.

ConcurrentBag(T) shows excellent scalability for mixed producer-consumer scenarios

ConcurrentBag(T) may not be appropriate for pure producer-consumer scenarios.

Although ConcurrentBag(T) has excellent performance for a mixed producer-consumer scenario, it will not have the same behavior for pure producer-consumer scenarios as all the consumers will have to repeatedly perform stealing operations and this will incur significant overheads and synchronization costs.

Other Considerations

ConcurrentBag(T) is a bit heavyweight from the memory prespective, the reason being that ConcurrentBag(T) is not disposable but it internally consists of disposable ThreadLocal(T) objects. These ThreadLocal(T) objects, even when they are not used anymore, cannot be disposed until the ConcurrentBag(T) object is collected by GC. IsEmpty, Count, ToArray() and GetEnumerator() lock the entire data structure so that they can provide a snapshot view of the whole bag. As such, these operations are inherently expensive and they cause concurrent Add() and Take() operations to block. Note that by the time ToArray() or GetEnumerator() returns, the global lock is already released and so the original collection may have already been modified.



The ConcurrentDictionary(TKey,TValue) type provides a thread-safe implementation of a strongly-typed dictionary. Prior to .NET Framework 4, the simple way to achieve thread-safe access to a strongly-typed dictionary structure was to use a lock to protect all accesses to a regular Dictionary(TKey,TValue). When using a locked Dictionary, the dictionary object itself can be used as the lock, so to safely update an element we would typically use the following:

void UpdateElement(Dictionary<int, int> dict, int key, int newValue)
lock (dict)
dict[key] = updatedValue;

When reading from this data structure, we must also take the lock as concurrent updaters may be making structural changes that make searching in the data structure impossible. Hence:

void GetElement(Dictionary<int, int> dict, int key)
lock (dict)
return dict[key];
} }

Using a common lock effectively serializes all accesses to the data structure even if the bulk of the accesses are simple reads.

The ConcurrentDictionary(TKey,TValue) type provides a thread-safe dictionary which does not rely on a common lock. Rather, ConcurrentDictionary(TKey, TValue) internally manages a set of locks to provide safe concurrent updates and uses a lock-free algorithm to permit reads that do not take any locks at all.

The ConcurrentDictionary(TKey,TValue) is applicable to any problem involving concurrent access to a dictionary where updates are possible. However, for read-only access to a dictionary such as a lookup table with fixed data, a simple Dictionary(TKey,Value) has lower overheads than ConcurrentDictionary(TKey,TValue).

If you require only concurrent reads with no updates, a regular Dictionary(TKey,TValue) or a ReadOnlyDictionary(TKey,TValue) is appropriate, even in multi-threaded scenarios.

It should also be noted that the Hashtable datastructure from .NET Framework 1.1 is intrinsically thread-safe for multiple non-enumerating readers and a single writer, but not safe for that multiple writers or enumerating readers. For certain situations, Hashtable is a reasonable baseline for comparison to ConcurrentDictionary(T), but we do not consider this further due to the more complex rules and because Hashtable is not intrisically a stronglytyped datastructure.

In the following sections, we will examine the performance of a ConcurrentDictionary(int,int) type in scenarios that involve various combinations of concurrent reads and writes. A dictionary is not typically used for producerconsumer scenarios but rather in applications with lookup-tables and caches or when constructing groups of data with identical keys. As such, we will consider a different set of scenarios than those used previously.

An (Almost) Read-only Dictionary

Some problems call for a thread-safe dictionary that is mostly read-only but is occasionally updated. For example, consider a mapping of NetworkID to Status which changes when interruptions or repairs occur:

CentralNet  OK  ,EternalNet  FAULT  LocalDeviceNet  OK

Because of the occasional updates, a multi-threaded application may only access this dictionary via completely thread-safe operations. The ConcurrentDictionary(TKey, TValue) has an opportunity to scale well due to its lockfree reads.


Number of Threads

Figure 13: Comparison of scalability for ConcurrentDictionary(TKey,TValue) and SynchronizedStack(TKey,TValue) in a read-heavy producerconsumer scenario.

Figure 13 shows scalability for a scenario where a total of M reads were made in the absence of additions, deletions, or in-place updates. We can see that the ConcurrentDictionary(TKey, TValue) can service multiple threads performing concurrent reads and still scale well. On the other hand, the performance of a locked Dictionary(TKey, TValue) degrades as more threads participate as contention on the shared lock becomes an increasingly great cost.

For read-heavy scenarios that require a thread-safe dictionary, the ConcurrentDictionary(TKey,TValue) is the best choice.

Frequent Updates

A variety of scenarios involve frequent adding and updating of values in a shared dictionary structure. For example, a dictionary might be used to accumulate item counts as data is extracted from a source. A simple thread-safe approach using a locked Dictionary(Int32,Int32) is:

void ExtractDataAndUpdateCount(Input input, Dictionary<int, int> dict)
int data;
ExtractDataItem(input, dict, out data); lock (dict)
if (!dict.ContainsKey(data)) dict[data] = 1; else
} }

If we look to use a ConcurrentDictionary(Int32,Int32), we need an approach that provides atomic updates without taking a common lock. One approach is to use a CAS loop that repeatedly reads an element and calls Try Add() if the element does not exist or TryUpdate() until it successfully updates without experiencing contention. Fortunately, ConcurrentDictionary(TKey,TValue) provides an AddOrUpdate() method which takes care of the details of performing an atomic update. AddOrUpdate() takes delegate parameters so that it can re-evaluate the updated value whenever write contention occurs. The corresponding code for ConcurrentDictionary(Int32,Int32) is thus:

void ExtractDataAndUpdateCount(Input input, ConcurrentDictionary<int, int> cd) {

int data;

ExtractDataItem(input, dict, out data); cd.AddOrUpdate(data, k => 1, (k, v) => v + 1); }


Number of Threads

Figure 14: Comparison of scalability for ConcurrentDictionary(TKey,TValue) and SynchronizedStack(TKey,TValue) in an update-heavy producer-consumer scenario.

Figure 14 shows a comparison of performance for continuous atomic update operations. We assume that the ExtractDataItem() method is essentially free, and that there are many different key values so that we are not always updating the exact same items.

For frequent updates, LockedDictionary(TKey,TValue) shows a performance profile that is very similar to its read profile, which is expected given that both situations require taking a common lock and thus serialize all work.

The ConcurrentDictionary(TKey, TValue) data indicates that for sequential scenarios (nCores=1), the

AddOrUpdate() operations are on the order of 0.5x the speed of a simple locked Dictionary(TKey, TValue).

However, as the number of participating cores increases, the use of multiple internal locks within

ConcurrentDictionary(TKey, TValue) permits some level of scalability. In particular, the update performance increases by up to a factor of 2 but it is limited by contention on the shared locks and cache-invalidation costs.

  In a multi-threaded scenario that requires frequent updates to a shared dictionary, ConcurrentDictionary(TKey,TValue) can provide modest benefits.

Since the scalability for frequent writes is not ideal, always looks for opportunities to rework an update-heavy algorithm such that the workers can accumulate data independently with a merge operation to combine them at the end of processing.

A scenario that entails all writes to a shared dictionary is the worst-case performance scenario for

ConcurrentDictionary(TKey,TValue). A more realistic scenario may involve some significant time spent in the ExtractDataItem() method or other per-item processing. As more time is spent on local per-thread computation, scalability will naturally increase as contention on the dictionary will cease to be the primary cost. This applies equally to both LockedDictionary(TKey,TValue) and ConcurrentDictionary(TKey, TValue).

Concurrent Reading and Updating

We can also consider situations where some amount of reading and writing takes place. Recall that a simple locked Dictionary(TKey,TValue) must take locks for both reads and writes but that a

ConcurrentDictionary(TKey,TValue) only requires locks for writes and that it manages multiple internal locks to provide some level of write-scalability.

The ConcurrentDictionary(TKey,TValue) is clearly the best choice whenever the level of concurrency is high, as it has better performance for both reads and writes. However, even in a dual-core scenario, we may find that ConcurrentDictionary(TKey,TValue) is the best choice if there is a significant proportion of reads.


% Writes

Figure 15: Comparison of performance for ConcurrentDictionary(TKey,TValue) and SynchronizedStack(TKey,TValue) for various read/update ratios.

If your scenario includes a significant proportion of reads to writes, ConcurrentDictionary(TKey, TValue) offers performance gains for any number of cores/threads.

Other Considerations

ConcurrentDictionary(TKey,TValue)’s Count, Keys, Values, and ToArray() completely lock the data structure in order to provide an accurate snapshot. This serializes all of these calls and interferes with add and update performance. Hence, these methods and properties should be used sparingly.

The GetEnumerator() method provides an enumerator that can walk the Key/Value pairs that are stored in the dictionary. GetEnumerator() doesn’t take any locks but it nonetheless guarantees that the enumerator is safe for use even in the face of concurrent updates. This is great for performance but, because no snapshot is taken, the enumerator may provide data that is a mixture of items present when GetEnumerator() was called and some or all of the subsequent updates that may have been made. If you require a an enumerable snapshot of the ConcurrentDictionary(TKey, TValue), either arrange for all updates to pause before enumeration or use the ToArray() method to capture the data whilst all the internal locks are held.

ConcurrentDictionary(TKey,TValue) update operations are internally protected by fine-grain locks whose granularity can be tuned by specifying the concurrency level in the constructor:

public ConcurrentDictionary(int concurrencyLevel, int capacity)

If a concurrencyLevel is not specified, the default is four times the number of processors. Increasing the concurrency level increases the granularity of the locks which generally decreases contention for concurrent updates. As a result, specifying a concurrencyLevel higher than the default may improve performance for frequentupdate scenarios. The downside is that all the operations that lock the whole dictionary may become significantly more expensive.

The default concurrency level is appropriate for most cases. For scenarios that include significant concurrent updates, you may wish to experiment with the concurreny level to acheive maximum performance.


Details of the Experimental Framework

All tests were run on identical 8-core machines with the following hardware configurations:

• Intel ® Xeon® CPU E5345 @2.33GHz, 2 sockets x 4-core, 8GB RAM

• .NET Framework 4 • All tests were run on both Windows Win7 Ultimate 32-bit and  64-bit

Each test was executed multiple times in a environment that reduced unnecessary external noise and we computed the mean elapsed time as the standard measure. Our results had standard deviations of less than 10% of the mean. We used several approaches to reduce noise in order to obtain stable results:

• We stopped all unnecessary background services and applications and turned off network access and peripherals where possible.

• We selected test parameters such that the each individual test iteration took at least 100 milliseconds. This reduced most noise.

• To reduce the impact of garbage collection, the tests kept memory use to a minimum by using simple types and small data volumes, and we forced garbage collections between runs.

• All timing measurements excluded the time required to initialize the application, warm up the .NET Framework ThreadPool, and perform other house-keeping operations.

Due to the nature of our tests, they do not represent entire applications. Thus the data we presented should be considered only indicative of the performance of similar units of a program on similar hardware, and not entire systems or applications.

Finally, we note that experiments run on 32-bit and a 64-bit platforms may show significant variance in both speedup and scalability. There are many factors that can be the cause this variance and, in some cases, the differences favor one architecture over the other. If maximum performance is crucial and an application can run on either a 32-bit or a 64-bit platform then specific performance measurements are required to select between the two alternatives.