Experiences from Microsoft Developer Division , Developer Division is running on TFS 2012 RC!
Back in the beginning, the DevDiv server was dogfood server for Microsoft Developer Division . Then as all of the folks shipping products in Visual Studio, there were too many critical deadlines to be able to put early, sometimes raw, builds on the server. So was dogfood TFS on a server called pioneer, as described here. Pioneer is used mostly by the teams in ALM (TFS, Test & Lab Management, and Ultimate), and has been running TFS 2012 on it since February 2011, which was a full year before beta. Never before have we been able to use TFS so early in the product cycle, and our ability to get that much usage on early TFS 2012 really showed in the successful upgrade of the DevDiv server.
DevDivision also run TFS 2012 in the cloud at http://tfspreview.com, and that’s been running for a year now. While that’s not a dogfood effort, it’s helped us improve TFS 2012 significantly. The other dogfooding effort leading up to this upgrade was Microsoft IT. They upgraded a TFS server to TFS 2012 Beta, and we learned from that as well.
The scale of the DevDiv server is huge, being used by 3,659 users in the last 14 days. Nearly all of those users are working in a single team project for the delivery of Visual Studio 2012. Our branches and workspaces are huge (a full branch has about 5M files, and a typical dev workspace 250K files). For the TFS 2010 product cycle, was not upgrade the server until after RTM. Having been able to do this upgrade with TFS 2012 RC, the issues found will be fixed in the RTM release of TFS 2012!
Here’s the topology of the DevDiv TFS deployment, which I’ve copied from Grant Holliday’s blog post on the upgrade to TFS 2010 RTM two years ago. I’ll call out the major features.
- We use two application tiers behind an F5 load balancer. The ATs will each handle the DevDiv load by themselves, in case we have to take one offline (e.g., hardware issues).
- There are two SQL Server 2008 R2 servers in a failover configuration. We are running SP1 CU1. TFS 2012 requires an updated SQL 2008 for critical bug fixes.
- SharePoint and SQL Analysis Services are running on separate computer in order balance the load (cube processing is particularly intensive).
- We use version control caching proxy servers both in Redmond and for remote offices.
These statistics will give you a sense of the size of the server. There are two collections, one that is in use now and has been used since the beginning of the 2012 product cycle (collection A) and the original collection which was used by everyone up through the 2010 product cycle (collection B). The 2010 collection had grown in uncontrolled ways, and there were more than a few hacks in it from the early days of scaling to meet demand. Since moving to a new collection, has been able pare back the old collection, and the result of those efforts has been a set of tools that we use on both collections (will be eventually release them). Both collections were upgraded. The third column is a server we call pioneer.
Grant posted the queries to get the stats on your own server (some need a little tweaking because of schema changes, and we need to add build). Also, the file size is now all of the files, including version control, work item attachments, and test attachments, as they are all stored in the same set of tables now.
|Coll.A Coll. B||Pioneer|
|Build agents and controllers||2,636||284||528|
|Uncompressed File Size (MB)||14,972,584||10,461,147||6,105,303|
|Compressed File Size (MB)||2,688,950||3,090,832||2,578,826|
|Files in workspaces||4,668,528,736||366,677,504||406,375,313|
|Areas & Iterations||4,255||12,151||7,823|
|Work Item Versions||4,325,740||9,107,659||9,466,640|
|Work Item Attachments||144,022||486,363||331,932|
|Work Item Queries||54,371||134,668||28,875|
The biggest issue faced after the upgrade was getting the builds going again. DevDiv (collection B) has 2,636 build agents and controllers, with about 1,600 being used at any given time. On pioneer, didn’t have nearly that many running. The result was that was hit a connection limit, and the controllers and agents would randomly go online and offline.
The upgrade to TFS 2012 RC was a huge success, and it was a very collaborative effort across TFS, central engineering, and IT. As a result of this experience and experience on pioneer, TFS 2012 is not only a great release with an incredible set of features, but it’s also running at high scale on a mission critical server!