OOXML: Why Is It Bad, and What Can We Do?

by Ostatic Staff - Apr. 02, 2008

Why is OOXML a bad standard? What does it mean for open source developers? And what, if anything, can menbers of the open source community do, now that OOXML has been adopted by the ISO?

We love to talk about "open standards" in the computer industry. But how do such standards get created? The story of OOXML, officially accepted as of today by the ISO, is a cautionary tale.

The story begins several years ago, when Microsoft switched to XML file formats for its Office products. Partly as a response to the standardization of OpenOffice's file format (ODF) Microsoft submitted its XML document formats, known as "Office Open XML," or "OOXML," to Ecma International, a European standards body for the computer industry, and was granted the standard (Ecma-376) in December 2006. A revised version of Ecma-376 was officially adopted by ISO, the International Standards Organization, earlier today (April 2nd, 2008), making OOXML an official international standard.

There are a number of standards bodies in the world, and each has its own set of rules. In general, a standards body appoints a committee to review and approve a particular standard. That committee reviews the draft, gathers comments and evidence, and revises the draft in the wake of comments, often multiple times. This process can take an awfully long time, both because the committee members are volunteers, and because a standard is something so complex, whose effects will be felt for decades to come.

The members of a standardization committee obviously worry about whether a standard is technically correct. They try to remove inconsistencies -- both internal and with the outside world -- and to ensure that the standard is as well defined and accurate as possible.

Committees must also consider how easy it will be to implement the standard. A standard that can only be implemented by its author is no standard at all. FTP, a file-transfer protocol and standard, has been around for more than 20 years, and is only meant to transfer files from one computer to another. Despite this, there are still problems getting FTP clients and servers to communicate seamlessly. The simpler the standard, the better it is for implementers and end users alike.

Another consideration has to do with intellectual property: Even if it's technically possible for everyone to implement a particular standard, it should be legally possible, as well. That is, implementation of the standard should not require the licensing of technologies from anyone else, particularly another member of the committee. The W3C itself was embroiled in such a dispute several years ago, when it considered the appropriateness of standards that included patents, after a number of companies pushed for such a consideration.

Unfortunately, yesterday's adoption of OOXML as an ISO standard fails on all three counts. To begin with, it seems that the OOXML standard was poorly defined, leaving a huge number of ambiguities and undefined terms. That's not surprising, given the fact that it is 6,000 -- yes, six thousand -- pages long, a size which makes it nearly impossible to ensure internal consistency. The large size also ensures that it will be difficult to create alternative implementations; would you like to be the programmer charged with checking that a particular program adheres to all 6,000 pages of the standard?

Moreover, parts of the standard require a programmer to deviate from many other, correct standards. For example, 1900 was not a leap year, as is the case with three out of every four "00" years. (Thus, 1900 was not a leap year, but 2000 was.) Microsoft got this point wrong when they first implemented Excel, and as a result, the OOXML standard requires that implementers make this same error, for the sake of consistency.

There are also serious questions regarding some Microsoft patents that any implementer will need to use. Microsoft has promised that it will not sue OOXML implementers for patent infringement, but the Groklaw site points out that this statement might be meaningless.

A Wiki document listing problems that people have found with OOXML are located on this site. There are many other objections to be found on the Web, such as here, here, and here.

If anyone in the open-source community were to propose a huge, unimplementable standard, they would have been laughed out of town. So this raises at least two questions: How did this happen, and what do we do about it?

This happened, from everything I can tell, through old-fashioned politics: Microsoft, from numerous reports I've read, managed to get a number of countries to vote in favor of adopting OOXML. For example, Norway's ISO representative voted to approve OOXML, despite the vocal objection of the Norwegian standards committee. Politics are a normal part of the standardization landscape, but this seems to have been an extreme case, by everyone's measure.

The adoption of OOXML is troubling not only from the perspective of open standards, but open source software as well. A 6,000-page standard is going to be difficult, if not impossible, to implement, which means that Microsoft has seemingly received international approval for a monopoly on its file formats. Open Office and other open-source programs can claim that ODF is "more open" than OOXML, but that's an argument which will be difficult to make to people without a technical background.

I must admit that it's not obvious what we, in the open source community, can do about this. Lobbying your national standards organization might have some effect, but probably not. Trying to use ODF and other open standards sounds fine in practice, but will quickly run up against the reality, which is that most people use MS Office. And encouraging third-party software developers to work with ODF before OOXML would probably be suicidal for a business, given the immense numbers of MS Office users, especially when compared with OpenOffice.

So, I ask you, our readers: What can and should we do about this, now that the decision has been made?