Unlocking the Cloud Means Open Data

by Ostatic Staff - Nov. 10, 2009

 Opponents of cloud computing cite data loss and vendor lock-in as a primary dangers of relying on such services. They're valid points, but since cloud computing isn't going away anytime soon, it's time to start finding solutions instead of simply sounding alarm bells.

Today's guest editor, Rafael Laguna, CEO of Open-Xchange, shares his thoughts about what it would take to make cloud services more reliable and trustworthy.

Open Data and the Cloud

By Rafael Laguna, CEO, Open-Xchange

You may have heard of the major snafu that happened around the Microsoft/Danger Sidekick service. The cloud service lost massive amounts of user data, catering nicely to the angst around data loss in the cloud. This was Microsoft and T-Mobile -- and they are not especially stupid. In the meantime, at least they found a backup and restored most of the data.

What this story was really about: Can cloud services be trusted. Period. The only way to make the cloud -- and social media -- services a trustable home for data is to make this data open and "commandable" by the owner of this data. As long as the cloud service providers think they own user data, as long as the services and data formats are proprietary -- depriving users from the ability to switch from one service to another and from the cloud back to on-premises -- cloud services are neither safe nor trustworthy. To get an idea on how some cloud services think of your data, check the terms of Google AppEngine, e.g. paragraph 5.2.

Open source software helped enable the full potential of computing by providing the fertile grounds that created the Internet and many other technologies and services -- including Google. Now this needs to happen for the cloud as well. Proprietary, closed cloud services are the industry’s worst nightmare, making the Microsoft Office and Windows monopoly look like kindergarten. Not only would we lose the control over the data formats, but also over the data itself, creating a “cloud lock-in" that would commit users to specific services forever.

Not all cloud services do/will provide programming interfaces to all the data they manage. Not all do/will provide an on-premise version of their service. The secret to unlocking the cloud lies in opening the data, and creating a data-hub that enables interchange -- even between proprietary, closed services -- and the migration from one (closed) service to another (open) service. For some data types, there are established standards that are good enough, like vCard for contact information and iCal/ICS for appointments, especially their Microformats cousins hCard and hCalendar. For activity streams and messaging there are no real standards yet, but some good old standards like the RFC’s around email.

With open data standards, integration migration is possible and the owner always can take ownership of his or her data. As long as a service is accessible by humans through a web browser, it is at least “crawlable” by a script that can convert the data contained in the service to standard formats that then can be digested by more open-data friendly services. It seems that many popular services have realized how ridiculous this crawling would be and provide good APIs and Web Services to access their system, thus at least retaining some control over what is going on. Really open data services require a place to store, manage, and utilize the data as well.

If the data obtained via crawlers and APIs can be used, only then does the lock-in of data go away. And, of course, the ideal foundation for such software is open source -- allowing users to publish and subscribe to information with as many data standards as possible. Some of these good fellows really exist.

So the desire and necessity of gaining back control over data, or, as we Trekkies here dubbed it “Command your Data” is what will evolve the power of the open source idea into the cloud.