Computerworld

Structured change detection

Andy Hunt and Dave Thomas are apostles of common sense. Their bestselling book, The Pragmatic Programmer, is a thoughtful guide to the craft of programming.
  • Jon Udell (Unknown Publication)
  • 07 March, 2004 22:00

Andy Hunt and Dave Thomas are apostles of common sense. Their bestselling book, The Pragmatic Programmer, is a thoughtful guide to the craft of programming.

Its tenets are closely aligned with those of the Agile Manifesto, which Hunt and Thomas co-wrote. Now they're self-publishing a three-volume "prequel" to The Pragmatic Programmer called The Pragmatic Starter Kit, which focuses on three core sets of skills: version control, unit testing and automation.

Two of the three volumes are available, and I've just read the first of them: Pragmatic Version Control Using CVS (Concurrent Versions System). It is a spectacularly lucid and useful book that brings CVS novices up to speed in a flash and offers CVS experts new tricks and broader perspectives.

Confession: I'm not (yet) the CVS expert that I should be. One of my excuses doesn't stand up to scrutiny: It's been a long while since I was part of a team programming effort. Working solo, my rationalisation has been that formal version control was overkill for the simple coding projects I undertake. But Hunt and Thomas aren't buying that excuse. They understand that friction is the enemy of version control — and they present recipes and scenarios that make the process nearly as frictionless as it can be.

Version control isn't only for code, of course. Any evolving set of documents can benefit from an infinite undo stack and a change narrative. In fact, the Hunt/Thomas book has prompted me to move my columns into a CVS repository — yes, I'm writing this column under version control.

Admittedly, CVS or any source-code control system is a dubious way to manage prose. Deeply wired into source code — and the tools that work with it — is the notion of the 80-character line. The ubiquitous change detector, diff, sees all content as a sequence of lines. Historically, that's worked remarkably well for code and not so well for other content types. A Word document, for example, is structured in terms of sections, subsections, and paragraphs, not lines. So when you're managing a Word document in CVS — as often happens because software projects typically include prose "artifacts" — the recommended strategy is to check it in as a binary file that's exempt from line-by-line change detection.

XML, however, creates a middle ground. Consider two versions of a Word document saved as XML. There are "structured diff" tools that can map the changes at an intermediate level, in terms of XML elements. For example, IBM's AlphaWorks site offers the XML Diff and Merge Tool for Java, while Microsoft's GotDotNet site offers XML Diff and Patch for .Net. Both of these free tools can track element-level change. To get a sense of what's possible, check out Monsell EDM's online demo of its Delta XML technology. The demo compares two subtly different versions of a complex graphic — the standard SVG (scalable vector graphics) "tiger" benchmark — and animates the differences between the two. It's stunningly cool.

As XML becomes the standard way to represent prose, graphics, and other content, we should expect such change visualization to become routine. What about code? It has sections, subsections and paragraphs, too. XML isn't — and probably shouldn't be — the primary way we read and write code. But the underlying abstract syntax tree has structure that can — and arguably should — help us see and comprehend the code's evolution.

Udell is lead analyst for the InfoWorld Test Center.