Why TFS 2012 rocks and what you need to know to scale out

 

Experiences from Microsoft Developer Division , Developer Division is running on TFS 2012 RC!

Back in the beginning, the DevDiv server was dogfood server for Microsoft Developer Division .  Then as all of the folks shipping products in Visual Studio, there were too many critical deadlines to be able to put early, sometimes raw, builds on the server.   So was dogfood TFS on a server called pioneer, as described here.  Pioneer is used mostly by the teams in ALM (TFS, Test & Lab Management, and Ultimate), and has been running TFS 2012 on it since February 2011, which was a full year before beta. Never before have we been able to use TFS so early in the product cycle, and our ability to get that much usage on early TFS 2012 really showed in the successful upgrade of the DevDiv server.

DevDivision also  run TFS 2012 in the cloud at http://tfspreview.com, and that’s been running for a year now.  While that’s not a dogfood effort, it’s helped us improve TFS 2012 significantly. The other dogfooding effort leading up to this upgrade was Microsoft IT.  They upgraded a TFS server to TFS 2012 Beta, and we learned from that as well.

The scale of the DevDiv server is huge, being used by 3,659 users in the last 14 days.  Nearly all of those users are working in a single team project for the delivery of Visual Studio 2012.  Our branches and workspaces are huge (a full branch has about 5M files, and a typical dev workspace 250K files).  For the TFS 2010 product cycle, was not upgrade the server until after RTM.  Having been able to do this upgrade with TFS 2012 RC, the issues found will be fixed in the RTM release of TFS 2012!

Here’s the topology of the DevDiv TFS deployment, which I’ve copied from Grant Holliday’s blog post on the upgrade to TFS 2010 RTM two years ago.  I’ll call out the major features.

  • We use two application tiers behind an F5 load balancer.  The ATs will each handle the DevDiv load by themselves, in case we have to take one offline (e.g., hardware issues).
  • There are two SQL Server 2008 R2 servers in a failover configuration.  We are running SP1 CU1.  TFS 2012 requires an updated SQL 2008 for critical bug fixes.
  • SharePoint and SQL Analysis Services are running on separate computer in order balance the load (cube processing is particularly intensive).
  • We use version control caching proxy servers both in Redmond and for remote offices.

These statistics will give you a sense of the size of the server.  There are two collections, one that is in use now and has been used since the beginning of the 2012 product cycle (collection A) and the original collection which was used by everyone up through the 2010 product cycle (collection B).  The 2010 collection had grown in uncontrolled ways, and there were more than a few hacks in it from the early days of scaling to meet demand.  Since moving to a new collection, has been able pare back the old collection, and the result of those efforts has been a set of tools that we use on both collections (will be eventually release them).  Both collections were upgraded.  The third column is a server we call pioneer.

Grant posted the queries to get the stats on your own server (some need a little tweaking because of schema changes, and we need to add build).  Also, the file size is now all of the files, including version control, work item attachments, and test attachments, as they are all stored in the same set of tables now.

 

Coll.A                      Coll. B                          Pioneer
Recent Users 3659 1516 1,143
Build agents and controllers 2,636 284 528
Files 16,855,771 21,799,596 11,380,950
Uncompressed File Size (MB) 14,972,584 10,461,147 6,105,303
Compressed File Size (MB) 2,688,950 3,090,832 2,578,826
Checkins 681,004 2,294,794 133,703
Shelvesets 62,521 12,967 14,829
Merge History 1,512,494,436 2,501,626,195 162,511,653
Workspaces 22,392 6,595 5,562
Files in workspaces 4,668,528,736 366,677,504 406,375,313
Work Items 426,443 953,575 910,168
Areas & Iterations 4,255 12,151 7,823
Work Item Versions 4,325,740 9,107,659 9,466,640
Work Item Attachments 144,022 486,363 331,932
Work Item Queries 54,371 134,668 28,875

The biggest issue  faced after the upgrade was getting the builds going again.  DevDiv (collection B) has 2,636 build agents and controllers, with about 1,600 being used at any given time.  On pioneer, didn’t have nearly that many running.  The result was that was hit a connection limit, and the controllers and agents would randomly go online and offline.

The upgrade to TFS 2012 RC was a huge success, and it was a very collaborative effort across TFS, central engineering, and IT.  As a result of this experience and  experience on pioneer, TFS 2012 is not only a great release with an incredible set of features, but it’s also running at high scale on a mission critical server!

Advertisements

One thought on “Why TFS 2012 rocks and what you need to know to scale out

  1. Great post finally something on what TFS looks like at Microsoft. I hope that you continue to write more about TFS infrastructure. Can you write more about the Tools you are developing to deal with collections? I hope that sounds like we will be able to move projects between collections soon. What is the size on disk of your collections?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s