Computerworld

XML spells Office upgrade

  • Jon Udell (IDG News Service)
  • 09 October, 2003 01:22

SAN FRANCISCO (10/08/2003) - It's been a long time since office suites in general, and Microsoft Corp.'s in particular, generated much heat. The features that most users depend on most often were hammered out before these programs were even ported to Windows. Word's document-handling prowess and Excel's analytical power have matured over the years, and they are formidable assets, but the truth is the average information worker has little need of them. Résumés, memos, and e-mails are written in Word by habit, not by necessity. Excel is typically used just to format, convey, and visualize tabular data. The way to reinvigorate Office was not to pile on more elite functionality, but rather to expand the scope of routine tasks. Office 2003 does so in ways that make it, arguably, the most compelling upgrade ever.

The information flowing through Office applications and stored in Office documents represents much of the intellectual capital of the modern enterprise. After years of milking its proprietary file formats, Microsoft opted to embrace an open and universal standard: XML. As a result, Office 2003, at least in its Professional and Professional Enterprise editions, promises to help us redesign our information ecosystems so that people, desktop applications, and network services can interact in new and strategically valuable ways. It's a bold vision. Will it change your enterprise for the better? Let's look at what new benefits are now possible, and what it will take to achieve them.

For Jason De Lorme, CTO of Monster, the job posting and recruitment Web site, strategic data assets come in the form of résumés -- lots of them, 95 percent of which are produced in Microsoft Word. Although using Word may be an appropriate way for job seekers to create impressive

8-1/2-by-11-inch pages, it's a lousy way to feed a database. So, most Monster users rely on the cut-and-paste method to transfer résumé content from Word documents into its database. Soon, De Lorme says, Monster will try an alternative method. Job seekers who have Word 2003 will be able to download Word templates that solve two problems at once. First, they will allow users to create, edit, and print résumés in the normal way. Second, their data will be mapped to XML elements and validated against HR-XML, the dominant XML schema in the human resources realm, allowing the information to be parsed by machines. If the experiment succeeds, job seekers will save time and everyone will benefit from high-fidelity data that can be easily exchanged and effectively searched.

Adapting Word 2003 to this kind of use takes serious effort by XML developers. Word wasn't built for structured data entry. Its XML capability was bolted on, not built in. And even the best special-purpose XML editors present usability challenges. To smooth out the user experience, Monster's templates protect tags that might be damaged by editing and use SmartDocs extensions to deliver context-sensitive guidance and lists of choices in Word's task pane.

"We're not betting the bank on this technology," De Lorme freely admits. What is certain is that your résumé, however you provide it, eventually becomes valid HR-XML. Word's ability to meet this requirement, and users' comfort with the resulting experience, will need to evolve over time. But the goal is clearly in view, and the software is moving in the right direction.

All Roads Lead to XML

Three different Office 2003 applications -- Word, Excel, and InfoPath -- have the power to read and write XML data that is not merely well-formed, but also valid with respect to customer-defined schema. This creates a wealth of new opportunities for enterprise information architects, but also a certain amount of confusion.

Consider the venerable expense report, a classic Excel application. It's now possible to bind an XML schema to a spreadsheet template and map elements of the schema to spreadsheet cells. Expense data gathered this way is guaranteed to be easily accessible by any application, service, or script running on any platform, just because it is XML. The fidelity of that data is likewise portable because any XML application can verify that it conforms to the schema.

Given all this, it might seem like a no-brainer to upgrade your expense reports to Excel 2003. But there's a wild card in the deck: Office's new XML-based forms application, InfoPath. As Excel and Word do, InfoPath can gather XML data, validate it against a schema, and augment declarative validation with programmed logic. Because it was built from the ground up for gathering structured data, however, InfoPath's interactive XML features are more flexible than Excel's or Word's. An InfoPath document is a container of nested, expandable data structures, and its user interface is tuned accordingly. InfoPath also makes it easy for less technical developers to build forms that make data entry flow smoothly. What it can't do is mimic the form that you would have printed and sent to accounting.

If you want to capture XML data and inject it into a business process, you may not care to simulate the piece of paper that is used to represent that process. In that case, InfoPath is a logical first choice. To meet the need of the many business processes that remain paper-bound, other options exist.

One solution, which Adobe's forthcoming XML-oriented forms designer aims to deliver, makes PDFs interactively XML-aware. Another approach, being tested by the Association for Cooperative Operations Research and Development (ACORD), an insurance industry organization, uses InfoPath to create forms that pass XML data to a Web service, which in turn sends back an ACORD-certified PDF version of the form. According to Mark Munie, business development executive at Avanade Inc., a Microsoft/Accenture Ltd. joint venture offering integration services for insurance (and other) industries, this setup combines efficient data entry with faithful reproduction of legacy forms. As a bonus, it enables an integrator -- such as Avanade -- to deliver customized solutions by intercepting and transforming XML data flows.

The Right Tool for the Job

Although Avanade likes InfoPath for the highly structured task of gathering insurance data, it likes Word 2003 for other purposes. For example, Avanade is working with the State of Missouri to capture its published rules and regulations in free-form Word documents which are then overlaid with XML metadata. In Office 2003, Word and Excel documents, unlike InfoPath documents, offer this overlay capability. An entire document can always be saved as XML, but you can bind just a subset of a document to a schema and manage it accordingly. It's always been possible to attach metadata to an Office document using global properties. With this approach, the metadata can appear anywhere in the document. A paragraph or section, for example, might be assigned to a category and thus exposed to a category-aware search engine.

Word isn't always the best choice for a text-heavy application, though. Hewlett-Packard is using InfoPath to overhaul the content management system that handles its sales guides. According to Jim Fulkerson, HP's manager of marketing field communications, these are highly modular documents, assembled on demand, that tell a salesperson "what there is to sell, who's the customer, who's the competition." Using InfoPath to manage these content chunks has spared Fulkerson a lot of the cutting and pasting he used to have to do to create new views of the material. And he plans to reuse the managed inventory in a variety of ways.

Given all these choices, how can you achieve the best outcomes? One aspect of Office hasn't changed: Serious development of layered applications is hard work. The lazy approach still makes sense. You'll want to leverage to the hilt what the tools do naturally, right out of the box. Here are some points for developers to consider.

Excel 2003

The sweet spot for Excel was always transfer and visualization of tabular data, and it still is. The new, low-hanging fruit is the XML data being made available by, for example, databases that publish queries as XML to WebDAV repositories. No special skills are required to attach to such resources in Excel 2003. You then turn them into visuals using charts, pivot tables, or just plain old sortable columns. If you're producing XML data, it's simple to pull it into Excel where you can see it and work with it.

InfoPath2003

XML notwithstanding, InfoPath provides something the Office suite has always needed: a way to enable end-users to create applications that gather structured data. It's true that InfoPath can consume and feed Web services and external databases, and these are indeed strategic capabilities. But don't overlook the fact that an InfoPath form can also function as a mobile, self-contained XML database that's usable offline and transportable as an e-mail attachment.

Word 2003

The new Save as XML feature produces WordML, an annoyingly verbose but nevertheless pure XML representation of your document. If you accumulate content in WordML, rather than in the DOC format, you'll be able to search that repository using any XPath-capable tool. What you are able to find, of course, depends entirely on what's been tagged. The deluxe solution is to map a subset of the document to an XML schema, but that entails complexity for developers and users alike. Here's a cheap alternative: Offer templates that promote consistent use of Word styles. This was always a good idea anyway; now those styled elements can facilitate structured search.

Office 2003 doesn't deliver everything on our wish list. We wish InfoPath's rich-text editor were more robust, and generated cleaner and simpler XHTML. We'd like simpler ways to streamline WordML, and to convert it to and from HTML. Most of all, we wish that the most frequently used Office application -- Outlook -- had shared some of the XML goodness. But this version of the suite takes major steps in the right direction, and creates something we frankly hadn't expected two years ago: credible reasons to upgrade.