Apache Software Foundation's President on Open Source and the Cloud

by Ostatic Staff - Oct. 19, 2010

ApacheCon, one of the biggest open source conferences of the year, is coming up in Atlanta, November 1st through 5th, sponsored by the Apache Software Foundation (ASF). If you haven't followed the growing influence of Apache-driven platforms, ranging from Hadoop to Cassandra to web servers, you're missing a big part of the heart and soul of open source. Jim Jagielski, Director and President of the Apache Software Foundation, will of course be a key figure at the conference. He provided OStatic with a guest post--one of a series we're doing in conjunction with ApacheCon--on the cloud and its relationship to Apache projects and open source. Here it is. 

The Cloud is Open Source

By Jim Jagielski, Director and President, Apache Software Foundation (ASF)

The concept of “the cloud” has caught fire with developers, architects and analysts. The reason is as simple as the concept itself: The cloud allows for dynamic, organic and seamless allocation of data, resources and services on an on-demand basis, resulting in a better performing and consistent user experience. So the basic idea is that behind the scenes, instead of sys-admins frantically adding and removing infrastructure to adjust for user demands, the cloud itself does that automatically.

Of course, as with most successful Internet paradigms, the user is (and should be) completely unaware of what capability is required to perform this magic (“Pay no attention to the man behind the curtain”). The cloud and service architect, unfortunately, has no such luxury. For the cloud to work, it must naturally be designed and implemented with the necessary components. And today, there is a plethora of “cloud services” that implementers can choose from. For the most part, these best-of-breed cloud components are open source developed and distributed by the Apache Software Foundation.

The cloud has been instrumental in pushing the NoSQL movement; traditional RDBS systems have issues in the distributed, dynamic cloud, and even if that was not the case, NoSQL itself is a more natural data scheme for cloud development. The two best known NoSQL implementations are Apache CouchDB and Apache Cassandra. CouchDB is a document-oriented approach and is written in Erlang, which is itself noted as a extremely performant language for distributed and concurrent systems. By using a fully RESTful API, CouchDB is also able to leverage (and enforce) the intrinsic scalability of the REST architecture itself.

Apache Cassandra takes the column-oriented approach based on the BigTable concept. By being fully distributed, de-centralized and dynamic, it also provides the ideal data store for the cloud, and is used by a large number of known envelope-pushers such as Twitter, Facebook and Rackspace.

Moving frontwards from the database/datastore tier, we need some mechanism to coordinate and manage the transfer of state information within the cloud, and Apache ActiveMQ fits that bill handsomely. ActiveMQ supports both REST and Stomp, which allows for flexibility between clients, as well and offering full support for multiple-client languages. ActiveMQ also supports AMQP which allows for the leveraging of Apache Qpid. Apache Qpid excels in implementing AMQP and thus provides  distribution, security, management and clustering with multi-platform support.

At the middle and web tier there are, of course, the industry standards: Apache HTTP Server and Apache Tomcat. A lot of work in the Apache HTTP Server (httpd) codebase has been in improving and enhancing its capability as a robust and fast dynamic load balancer. Tomcat has also been improved by the addition of faster HTTP listeners, such as those based on APR and the NIO library. The combination of these two stalwarts ensure high concurrency and availability, crucial to the cloud.

Although the whole idea of the cloud makes such concepts as “maximum concurrent sessions per component” somewhat moot (after all, horizontal scaling alleviates that “problem”), there is another tool from the ASF toolbox which is finding increased traction within the cloud. This is the Apache Traffic Server, which is designed explicitly to be a fast, event-based HTTP gateway and cache. An extremely viable design uses Traffic Server as a fast front-end packet-switcher and cache, which forwards to httpd for dynamic load balancing and dynamic content (supplemented by Apache Tomcat).

Of course, there are also many other external cloud providers out there, offering a wide suite of services from simple data stores to more extreme end-to-end solutions. The problem with them, however, is that they all provide their own interfaces for usage. This means lock-in for developers and users, which is pretty against the whole idea of the cloud when you think about it. As you would expect, the ASF comes to the rescue with a number of very cool and interesting projects which serve to provide a single interface to all those cloud providers. Currently in incubation, there is Apache libCloud, a client library written in Python, and Apache DeltaCloud, a Web services oriented API.

Rounding out the full package, there is also Apache Nuvem, a programming interface that is compatible with multiple cloud providers, Apache Thrift, a cross-language code generation stack and serialization framework, useful for RPC-based scenarios.

Finally, there is the OS on which the cloud runs. Again, open source is key here and one finds Linux as the OS of choice. Here Red Hat really has a strategic advantage by being able to have such crucial aspects of the cloud, including virtualization, deep down at the kernel level. Even so, other Linux variants are excellent choices and even FreeBSD is making inroads.

So we see the continued importance of open source as technology marches forward. Open source was the basis on which the Internet was built, followed by the foundation of the web, and now it is at the core of the cloud.

Jim Jagielski's been active on the Net since the early 1980s, starting as editor of the A/UX FAQ. He worked on the NCSA server and joined the Apache Group (as it was called back then) at a very early stage. He actively contributes to numerous Apache projects such as Apache HTTPD, APR and Tomcat, but also hacks on other FOSS projects as well. Jim also serves as a Director and President for the ASF, which he co-founded. He is also on the board of the Outercurve Foundation and OSSI.

 Jim will be presenting at ApacheCon in Atlanta 1-5 November. For more information visit http://apachecon.com.