
As the saying goes, few things in life are certain except death and taxes. To that, we should add another certainty: that the amount of data that you need to store and manage will continue to grow at a rapid pace. One way to deal with this profusion of data is with clustered storage, like Gluster.
Gluster is an open source storage platform for working with large amounts of data (terabytes all the way up to petabytes) that ties together everything from the operating system layer to filesystem and management interface. To get a deeper view on Gluster, we asked Anand Babu Periasamy, CTO and co-founder of Gluster, to describe the technology and give a glimpse into the project's roadmap.
OStatic: How did Gluster get its start? What's the origin of the software and the technology?
Hitesh Chellani and I co-founded Gluster, and we had been part of the team at California Digital Corporation that built the 'Thunder' supercomputer for Lawrence Livermore National Lab. When 'Thunder' was put into production in 2004 it was the 2nd fastest supercomputer in the world, demonstrating the feasibility and power of building large scale computing clusters with industry standard hardware (IA64) and an open source software stack. Following the 'Thunder' project, Hitesh and AB left California Digital to start a company with the goal of bringing the open source software / commodity hardware combination to the commercial enterprise.
In working with early customers, mostly in the energy exploration industry, it became clear that the pain on the storage side of the data center was more acute. The team looked at existing alternatives but recognized from past experience that it would be better and faster to build from scratch without legacy limitations. That is how the original Gluster file system was born.
OStatic: Tell us about Gluster, what it is, and what it's good for.
Gluster Storage Platform is clustered storage. In other words multiple storage building blocks, or nodes, are connected and our software aggregates those resources into a unified pool. Gluster automatically manages tasks like data distribution, I/O scheduling, replication, etc. The key advantage here is scalability; Gluster can manage hundreds of storage nodes and multiple petabytes of capacity. The 'scale-out' approach enables this, eliminating bottlenecks and allowing customers to add resources as they grow. We are also a software-only solution that runs on commodity hardware, a model that drastically lowers cost.
Gluster Storage Platform excels at managing large numbers of files. Files can range from small to very large and the product is flexible enough to support a wide range of application types. Managing large numbers of files is generally referred to as the 'unstructured data explosion' problem. The modular design of Gluster makes it possible to tailor the configuration to a wide range of needs.
OStatic: What type of environments is Gluster being used in, and what kind of workloads is Gluster aimed at?
Gluster offers great flexibility and is therefore well suited for a wide range of applications and uses cases. Generic use cases include: scalable Network Attached Storage (NAS), high performance storage, archive, media delivery, and cloud. We span industries such as online music/video, managed hosting, health care, biotech, energy, and others.
OStatic: What companies are involved in Gluster development, aside from Gluster?
Gluster is the primary developer for the product; however, we do collaborate with other companies, both vendors and customers. Early on we worked closely with the team maintaining the Filesystem in Userspace (FUSE) project, and now one of the lead maintainers is a Gluster employee. We are collaborating with the cloud team at Red Hat to enhance the cloud capabilities of the product. Early on, the team was considering writing their own code from scratch, but we worked together pointing out APIs and features on our roadmap and now we collaborate. Another example is a customer who offers managed hosting services and will be writing a billing module, also under the GPL license, for the product. The HyperTable open source database (C++ implementation of Hadoop) works with us to support Gluster as the back-end scalable file system. We have several customers who have deployed Gluster on Amazon Web Services (AWS) who are working with us to productize solutions for this environment.
OStatic: Can you describe the community model that's being used to develop Gluster?
We have a growing community of over 1,000 registered members whom are very active. The community contributes bug fixes regularly and occasionally develops features/modules that get integrated into the product. We like to point out that as hard as it is to develop a file system, the hardest part is testing and quality assurance. Our community does an outstanding job of stressing the product in a wide range of use cases, frequently in ways we never envisioned. One interesting example is a community member wrote a Python binding module – whether this is useful or not is still an open question, but it highlights the axiom that there are far more ideas from people outside the company than inside. In addition to identifying bugs, our roadmap is heavily influenced by community input.
Another interesting dynamic we are seeing with file system development is our implementation and architecture is building the pool of file systems developers by lowering the barriers to entry. Open source is a big part of this, but the real benefit is our product is written in user space with a modular architecture that simplifies feature development. One no longer needs to be deeply familiar with OS kernel development or file system internals. We have recent college graduates with basic C programming skills making contributions, we have multiple student targeted projects like compression and encryption being offered through the Google Summer of Code program, and we have seen college courses teaching file system development using Gluster as part of the curriculum.
OStatic: What parts of the stack, if any, are not open source?
The entire software stack of Gluster Storage Platform is open source and licensed under GPLv3. Additionally, the product available for free download is identical to the commercial product supported by subscriptions.
OStatic: Tell us a bit about the roadmap -- what's coming, and why?
We will continue to focus on improving the manageability and ease of use through core features as well as services provided by the Gluster Subscription Network Portal such as monitoring and analysis tools; it is impossible to scale cost effectively without simplicity. The industry is rushing to virtualize every corner of the data center, our ability to virtualize storage resources under a global namespace is a key advantage here and we continue to invest in features that optimize our storage for virtual server environments. Cloud computing is another area where customers are trying to sort through the hype for practical solutions. We are taking our experience with cloud storage deployments from our existing customer base to create ready to use cloud solutions for both private and public deployments.