Guest Post: Yahoo's Cloud Team Open Sources Traffic Server
Today, Yahoo moved its open source cloud computing initiatives up a notch with the donation of its Traffic Server product to the Apache Software Foundation. Traffic Server is used in-house at Yahoo to manage its own traffic and it enables session management, authentication, configuration management, load balancing, and routing for entire cloud computing stacks. We asked the cloud computing team at Yahoo for a series of guest posts about Traffic Server, and you'll find the first one here.
Introducing Traffic Server
By The Yahoo Cloud Computing Team
Today, Yahoo is excited to open source Traffic Server software that we rely on extensively. An Apache Incubator project, Traffic Server is an extremely high performance Web proxy-caching server, and has a robust plugin API that allows you to modify and extend its behavior and capabilities.
Traffic Server is fast. It was designed from the start as a multi-threaded event driven server, and thus scales very well on modern multi-core servers. With a quad core 1.86GHz processor, it can do more than 30,000 requests/second for certain traffic patterns. In contrast, some of the other caching proxy servers we've used max out at around 8,000 requests/second using the same hardware.
It's extensible. It has native support for dynamically loading shared objects that can interact with the core engine. Yahoo! has internal plugins that remap URLs; route requests to different services based on cookies; allow caching of oAuth authenticated requests; and modify behaviors based on Cache-Control header extensions. We've replaced the default memory cache with a plugin. It's even possible to write plugins to handle other protocols like FTP, SMTP, SOCKS, RTSP; or to modify the response body. There is documentation for the plugin APIs, and sample plugin code available today.
Traffic Server is serving more than 30 billion Web objects a day across the Yahoo! network, delivering more than 400 terabytes of data per day. It's in use as a proxy or cache (or both) by services like the Yahoo! Front Page, Mail, Sports, Search, News, and Finance. We continue to find new uses for Traffic Server, and it gets more and more ingrained into our infrastructure each day.
At its heart, Traffic Server is a general-purpose implementation that can be used to proxy and cache a variety of workloads, from single site acceleration to CDN deployment and very large ISP proxy caching. It has all the major features you'd expect from such a server, including behavior like cache partitioning. You can dedicate different cache volumes to selected origin servers, allowing you to serve multiple sites from the same cache without worrying about one of them being "pushed" out of the cache by the others.
The current version of Traffic Server is the product of literally hundreds of developer-years. It originated in Inktomi as the Inktomi Traffic Server, and was successfully sold commercially for several years. Chuck Neerdaels, one of the co-authors of Harvest, which became the popular open source Squid proxy caching server, has been integral in Traffic Server's history, managing the early development team, and leading the group today. Yahoo! acquired Inktomi in 2003, and has a full time development team working on the server. We plan to continue active development. For example, we are planning to add support for IPv6 and 64bit, and improve its performance when dealing with very large files. We'd love to work with the community on these and other efforts.
Of course, the server is neither perfect nor complete. Internally, Yahoo! uses Squid for some caching use cases where we need more fine-grained cache controls like refresh_patterns, stale-if-error, and stale-while-revalidate. By open sourcing, you the community can help add the features you need more quickly than Yahoo! can by itself. In exchange, the public gets access to a server that Yahoo! has found incredibly useful to speed page downloads and save back-end resources through caching.
As an Apache Incubator project, we hope to graduate to a full Apache top level project. We chose the Apache Software Foundation because of our experience with the Hadoop project; its great infrastructure to support long running projects; and its long history of delivering enterprise class; free software that supports large communities of users and developers alike.
Over the next few weeks, look for more detailed posts on plugins; how to get started with using the code; and more details on the roadmap and how to get involved in the project. In the meantime, grab the source; browse the documentation; send feedback; and help make the project even better.