Page:Aaron Swartz s A Programmable Web An Unfinished Work.pdf/44

32 '''5. BUILDING A PLATFORM: PROVIDING APIS'''

This is easy for humans to write and read, but even more importantly, it’s automatic for computers to write and read. In most languages you don’t even need to think about the fact that you’re using JSON: you just ask your JSON library to serialize a list and it does it. Read in a JSON ﬁle and you it’s just like your program’s getting a normal data structure.

XML, on the other hand, supports none of this. Instead, it thinks in terms of elements with character data and programming instructions and attributes, all of which are strings. Publishing data as XML requires ﬁguring out how to shoehorn your internal data into a particular format, then making sure you do all of your quoting properly. Parsing XML is even worse.

The main reason XML is so bad at sharing data is because it was never designed to do that in the ﬁrst place. It was a format for marking up textual documents; annotating writing with formatting instructions and metadata of various sorts, ala HTML. This is why it does things like distinguish between character data and attribute data—attribute data is stuff that isn’t part of the actual text, ala:

The word “green” is an annotation, not part of the text, so it goes in an attribute. All of this goes out the window when you start talking about data:

Why is “age” an attribute while “name” is an element? It’s completely arbitrary, because the distinction makes no sense.

Alright, so XML has a few more features that nobody needs. What’s the harm in that? Well, it’s also missing a whole bunch of features that you do need—by default, XML has no support for even the most basic concepts like “integer;” it’s all strings. And adding it requires XML Schema, a speciﬁcation so mind-numbingly complex that it actually locks up my browser when I try to open it.

But the costs of such complexity aren’t simply more work for developers—they really come in the form of bugs, especially security holes. As security expert Dan Bernstein observes, two of the biggest sources of security holes are complexity (“Security holes can’t show up in features that don’t exist”) and parsing (“The parser