Microsoft gets tough with independent testers

Comments

Randy Kennedy thinks he knows something about the performance of Windows 2000 vs. NT that might be of interest to IT executives.

But he won't tell you.

He won't tell anyone, because Microsoft Corp. won't let him.

Kennedy, research director for Competitive Systems Analysis, made the mistake of using benchmark testing to compare the operating systems running SQL Server without first getting written permission from Microsoft to discuss the results, which Microsoft has forced him to suppress.

The one-paragraph benchmark restriction is right there in the 8,600-word licensing agreement of SQL Server, the one that every network administrator agrees to when he clicks the "I Agree" button before installing the SQL Server database.

Microsoft, and countless other vendors with the same restriction, are enforcing it while debate rages as to whether the restriction protects consumers from bad data or protect vendors from bad test results.

The benchmark restriction is not used in every Microsoft license. Its firewall and cache product - Internet Security and Acceleration Server - has the restriction, but Microsoft dropped the restriction for the latest version of Exchange shipped last year.

But the real eye-opener for IT executives, who already regard vendor-funded benchmark tests lightly, is that so-called independent tests, such as Kennedy's, have their methods and system settings massaged and fine-tuned by vendors. These companies hold control over whether the results will ever see the light of day, and vendors use the restriction to influence what is tested and how, according to software testers.

Microsoft officials say the benchmark restriction protects users from misleading or false information. Oracle is famous for defining similar constraints, as is Network Associates Inc. with its McAfee virus protection software.

But what's interesting in Kennedy's case is that Microsoft spent five days working with him to ensure accuracy by refining testing methodology and hardware and software tuning.

When that didn't change the results, Kennedy was gently reminded of the licensing agreement. When he didn't back down, he was threatened with legal action.

"The way they handled it was very unprofessional," says Kennedy, who has done testing work for Microsoft, IBM and Intel. "They went from the cooperation approach to, 'Let's slam on the brakes' with licenses and veiled threats."

Microsoft officials say the process would have been different if Kennedy had come to them before testing and given them time to review the test methods.

They say the licensing issue came up late because the Microsoft engineers Kennedy dealt with were unaware of the restriction.

Kennedy's results weren't pretty, especially the week before Microsoft released its own benchmark tests showing how Win 2000 DataCenter and SQL Server running on a 16-way Unisys server can energize enterprise resource planning applications.

Truth in testing

There is no doubt that the so-called shrinkwrap software contracts that restrict benchmark tests give vendors a firm grip on the testing process, especially for databases. Software makers say there are good reasons.

"There are a lot of variables in database testing, and if you don't control variables, it is easy to get results that are skewed," says Jeff Ressler, lead product manager for SQL Server.

But the restriction exists in part because of sophisticated testing tools that are drill sergeants for software.

To wit, the restriction was dropped for Exchange 2000 because "there was little risk of anyone running a benchmark test and publicizing it, particularly since there's no good, standard tool for doing so out there," says Stan Sorenson, product manager for Exchange.

"A lot of times testers rush, and that concerns us," Ressler says.

A typical software test involves details on what is being tested, how it's tested, on what hardware and with what specific testing tools. And tests always include re-creating the acquired result multiple times.

Ressler says Microsoft has never denied a customer request to share benchmarking results with another customer, "but the media is different."

Microsoft took issue with Kennedy's tests for a number of reasons, including the hardware and drivers used, and because he used the database to test operating system performance. The benchmark restriction applies not only to direct tests on SQL Server but also to any test environment that includes the software.

The wrangling with Kennedy points to the fact that testing, especially of databases, is a touchy subject.

"If someone comes out with better [transaction] numbers than yours, you live and die by that," says Tom Henderson, principal researcher for Extreme Labs.

He says there is spin control exerted by vendors regarding benchmark tests - not so much over the results, but rather over what gets tested.

"It makes the vendors' lives easier, they don't have to be on the defensive all the time," Henderson says.

John Bass, technical director for Centennial Networking Labs at North Carolina State University in Raleigh, says testing is a real game.

"I always let Microsoft know what I am doing and work with them, but I never divulge my results until they are published," Bass says. "If you say too much, the game goes in their favor. Microsoft is a master of muddying the waters so it doesn't look like they are playing the licensing agreement game."

The game has found its way into legal arenas, where the debate centers on the merits of the benchmark restriction.

"The benchmarking ban is very controversial," says Cem Kaner, a professor of computer science at the Florida Institute of Technology and a lawyer.

"Commercial customers have the same right to information and comparative data as any consumer," Kaner says. "It's an attack on the free-market economy to block the press from revealing that second-rate products are second-rate."

Ray Nimmer, author of the controversial Uniform Computer Information Transactions Act (UCITA), says contract law on the benchmark issue is not changed by the proposed UCITA law.

Instead, he says, UCITA will offer protection against contractual abuses.

However, the issue raised by UCITA critics is that consumers or software testers such as Kennedy would face daunting costs to mount a legal challenge, and that will chill any desire to fight the benchmark restriction. There has never been a single court case on the benchmark issue, according to Kaner.

And Kennedy has no plans to bring the first such case to court.

The upshot for enterprise network executives is that they will never be able to evaluate if his tests reveal insights into Win 2000 performance or insights into Kennedy's method for testing products.