Cassandra: Facebook Shares More of its Secret Sauce

by Ostatic Staff - Jul. 15, 2008

Facebook has stepped up its contribution to the open source community with the release of the Cassandra project on Google Code. The location is mildly ironic: Cassandra is roughly an open source alternative to Google's internal "BigTable" storage system. (BigTable itself isn't open source, though you can gain access to it via the AppEngine DataStore API). Cassandra isn't for everyone, but if you're dealing with scalability issues arising from database concurrency bottlenecks, it's something that you should know about.

Apart from reading the Java source code, you can get a rough overview of Cassandra's capabilities by reading this SIGMOD presentation. What you'll find is a system that is carefully designed to get around the problems of high-traffic, highly-normalized relational databases. Features of Cassandra include denormalizing everything to one huge table with column groups, replication among multiple nodes, predictable low write times, delayed consistency, and built-in health monitoring.

Relational database purists may feel queasy at some of the tradeoffs that this design involves - such as the loss of atomicity and the fact that consistency between cluster members is statistical rather than deterministic. But it's hard to argue with success: Facebook has used Cassandra to scale out a tremendous amount of data without apparent major issues.

The Cassandra code is under the Apache 2.0 license. Best news for implementers: they supply pre-built interfaces for  Cocoa, C++, C#, Java, Perl, Python, Ruby, and several other languages.