Wikipedia Moving Data Centers
The Wikimedia Foundation is moving its data center today away from their original home in Tampa, Florida to Virginia. According to the Wikimedia technical team, even though all reasonable precautions have been taken, it is still prudent to expect major disruptions in service during the transition. The move today culminates a plan that began nearly four years ago.
According to the announcement:
Wikimedia sites have been hosted in our main data center in Tampa, Florida, since 2004; before that, the couple of servers powering Wikipedia were in San Diego, California. Ashburn is the third and newest primary data center to host Wikimedia sites.
Looking into Wikipedia’s technical operations is fascinating. Wikipedia is as open in their engineering practices as they are with their source code, documenting the internal workings of the operations team in Wikitech. The Wikimedia Foundation offers up scores of internal documents, bug reports, and even core dumps for public scrutiny.
Unfortunately, Wikipedia’s public information on some of their pages is outdated. The latest details on this page date back to 2010 and cite four main colocation facilities, two in Amsterdam and two in Florida. Wikipedia’s technical FAQ also notes a South Korean facility.
An interesting note in the older documentation refers to how Wikipedia uses MySQL:
Our structured data is stored in MySQL. We group wikis into clusters, and each cluster is served by several MySQL servers, replicated in a single-master configuration.
Having been in several database replication discussions over the past few years, it is good to see that Wikipedia, one of the biggest sites on the Internet, uses a simple master-slave replication scheme for their databases. My recommendation in any network engineering endeavor is to always choose the simplest, most battle-hardened configuration possible, and then build out from there. It is important not to “over-engineer” a solution when building a web application. Too often the resulting system is fragile, and subject to the high-availability or replication itself being the cause of downtime.
I also think it is important to understand that the cloud is not as amorphous as some would believe. No matter which cloud service is used, the data resides on a server somewhere, and how that data is replicated and how the service provides high availability is reflected in how resilient the service is to the end user. As I type this, Wikipedia is in the middle of their move, but I personally have yet to experience even a slight delay in service. Evidence of a well built service.
After the transition is complete, the data center in Florida will become a hot standby, keeping in sync, but not serving traffic unless there is a problem with the data center in Virginia.
The legacy data center in Tampa will continue to be maintained, and will serve as a secondary “hot failover” data center: servers will be in standby mode to take over, should the primary site experience an outage. Server configuration and data will be synchronized between the two locations to ensure a transition as smooth as possible in case of technical difficulties in Ashburn.
Wikipedia continues to grow and mature, and provides a reasonable technical roadmap for smaller companies looking to sensibly grow their web applications.