Two years ago, I gave the keynote address on the opening day of the XML 2003 conference. The next day, Adam Bosworth delivered a weirdly complementary keynote in which he began to lay out an idea he’s been developing ever since — first at BEA and now at Google. The idea, in a nutshell, is that the truly scalable databases of the future will be more like the web than like Oracle, DB2 or SQL Server.
More recently, Bosworth elaborated on some of the lessons the web has taught us about simplicity, human accessibility, “sloppily extensible” formats, the social dimension of software and loose coupling. But he also introduced a key technical point about RSS and Atom, the feed formats powering the blog revolution. These formats represent sets of items. Typically, the items contain blog postings, but they can also contain XML fragments that represent anything under the sun. What’s more, items can link to other items or collections. Bosworth argues that this architecture lends itself to aggressive scale-out, decentralised caching and grassroots schema evolution, all of which tend to elude conventional databases.
There’s no free lunch, of course. When querying this RSS/Atom data web, one should expect more structural precision than full-text search affords, but don’t plan on fast execution of complex nested queries.
We’ve yet to colonise the middle ground between these extremes and I don’t think anyone really knows what the sweet spot will turn out to be. I’ve got plenty of mileage out of XPath and XQuery, and my dream is that these XML-oriented query disciplines can be federated at a large scale. But first things first: we need to create the data web. Recently, two leading figures have dropped major hints about how this is going to happen.
The first was Bill Gates, who, in a September interview, told me, “The RSS data web is a natural development coming out of the acceptance of XML ... and we’ve got some ideas internally ... about making RSS work two-way”.
Historically, RSS has been a read-mostly affair. There are APIs through which blogging tools can inject content into publishing systems, which then reflect it back out as XML feeds. But, while the blogosphere has at last realised the vision of a two-way web, RSS as a data transport remains largely asymmetric. Microsoft evidently wants to change that.
The second and much more explicit hint appeared a month later in an article by Adam Bosworth. Atom is both a feed format and a publishing protocol. The latter, Bosworth wrote, is “a simple HTTP-based way to INSERT, DELETE, and REPLACE” entries within a feed.
Microsoft developer and blogger Dare Obasanjo responded with a question. “Perhaps,” he asked, “this Atom store, accessible via Atom feeds and the Atom API, is [the rumoured] Google Base?”
We’ll surely see more squabbling within the already fragmented world of lightweight XML syndication. But while the RSS feed format won the first round, and I suspect the Atom API will win the next one, don’t take your eye off the ball. This game isn’t about formats and APIs; it’s about the emergence of a data web made up of loosely coupled sets of XML fragments that people can easily read and write. Bring it on!
Udell is lead analyst at the InfoWorld Test Centre. Contact him at firstname.lastname@example.org