A Look Inside Tumblr's Architecture

by Ostatic Staff - May. 22, 2013

Yahoo recently purchased Tumblr for a cool $1.1 billion. Tumblr pushes some surprisingly high numbers through their service, so aninside look at the architecture that Yahoo bought is well worth the read. The portion I found most interesting are the details on the MySQL database setup, and how Tumblr uses MySQL to scale massively, and keep the service available.

High Scaleability interviewed Blake Matheny, Distributed Systems Engineer at Tumblr, who summed up their views on MySQL best:

MySQL (plus sharding) scales, apps don’t.

Nowhere in the discussion is mention of master-master replication, DRBD, or database clustering. Data in the database is replicated using standard master-slave MySQL replication, and, as quoted above, split between servers using sharding. Sharding, to grossly oversimplify, is the splitting of data between physical machines. In one example, Database A might hold all the user accounts for last names starting with letters A-M, and Database B might hold all the user accounts for last names starting with letters N-Z. The application would then need to look up in an index which database server would hold the account they were looking for. To scale this setup out horizontally from two servers to four, the database would be split four ways instead of two: A-G, H-M, N-T, U-Z, or something similar.

Sharding provides a form of high availability in that even if one database server goes down, the entire application stays up, and only the users who’s accounts reside on the broken server would be affected. If you couple sharding with MySQL replication slaves for doing reads off of, it is even more likely that the application will “appear” to be up when a single database server is down. Appropriate error handling in the application could convey a message to the user indicating that something is wrong, and that the hosting company is working on it.

Tumblr’s setup has survived more traffic than most sites will ever have to endure, and at a very high standard. When discussing Tumblr’s decision to move to a Java Virtual Machine and the Scala language, Blake said:

…they target 5ms response times, 4 9s HA, 40K requests per second and some at 400K requests per second.

The stats section towards the top of the article says the rest:

  • 500 million page views a day
  • 15B+ page views month
  • ~20 engineers
  • Peak rate of ~40k requests per second
  • 1+ TB/day into Hadoop cluster
  • Many TB/day into MySQL/HBase/Redis/Memcache
  • Growing at 30% a month
  • ~1000 hardware nodes in production
  • Billions of page visits per month per engineer
  • Posts are about 50GB a day. Follower list updates are about 2.7TB a day.
  • Dashboard runs at a million writes a second, 50K reads a second, and it is growing.

For many of us in smaller data centers, these numbers are pretty far out of our reach. Dumping “Many TB/day” into anything could cause a major problem with disk space. What fascinates me about Tumblr’s architecture is that 1) it is built entirely off of open source code, and 2) if Tumblr can scale their infrastructure with 4 9’s requirements and 500 million page views per day with MySQL, I think most of us can too.