Computerworld

Google’s supercomputer

In exchange for your free gigabyte of searchable email, Google's newly announced web mail service, Gmail, will scan your messages and match them to relevant ads. Some people are worried about invasion of privacy. Others, like writer Phil Windley, think that issue is a red herring.
  • Jon Udell (Unknown Publication)
  • 16 May, 2004 22:00

In exchange for your free gigabyte of searchable email, Google’s newly announced web mail service, Gmail, will scan your messages and match them to relevant ads. Some people are worried about invasion of privacy. Others, like writer Phil Windley, think that issue is a red herring.

“If you truly respect my privacy, keep your nose out of my business with Google — it’s private,” he writes on his blog.

A trenchant analysis of Google’s larger ambitions appeared on another blog. Rich Skrenta, CEO of Topix.net, posted an essay titled The Secret Source of Google's Power in which he argues that Google’s server farm, with its customised cluster operating system and fault-tolerant petabyte file system, has become a supercomputer.

“While competitors are targeting the individual applications Google has deployed,” he writes, “Google is building a massive, general-purpose computing platform for web-scale programming.”

In an era when decentralisation is in vogue and loose coupling is regarded as the only way to achieve planetary scale, Skrenta’s essay touched a nerve, suggesting that the right kind of centralised and tightly coupled architecture has no practical limits. For Tim O’Reilly, that was a wake-up call.

“Once internet apps truly get to scale,” he writes on his own blog, “they’ll make the network itself disappear into the universal virtual computer”.

Centralisation and decentralisation are the yin and yang of computing. Witness Microsoft, a company whose dedication to the personal computer seems radically at odds with the idea of the Google supercomputer. Microsoft’s IT operation takes justifiable pride in running only Windows software on x86 PCs. But I was fascinated to learn, on a recent visit, that its entire worldwide business operation is serviced by a single instance of SAP R/3.

So should we say that the computer is the network, or that the network is the computer? Both statements are true. A supercomputer, operating at global or merely enterprise scale, creates its own internal network of services. But supercomputers also federate with their peers and converse with their myriad clients to enact computation on a grander scale. There’s no single right architecture or topology. Within and across enterprises, we’ll deploy systems that embody all of these patterns. A crucial question, as O’Reilly points out, is not where or how computation is performed, but rather: “Who will own the data?”

We might also ask: “What, exactly, is the data?” In the case of Gmail, for example, it’s much more than the contents of a bunch of individual mailboxes. In aggregate, those mailboxes contain relationship data that can be mined for the kinds of applications. Similarly, a web services intermediary can see the relationships among SOAP end points and infer the context of transactions. From such privileged vantage points, much power can and will be wielded.

Network theorists believe that all networks inevitably form hubs. The “services fabric” that enterprise architects are now weaving may sound egalitarian, but it’s not immune to this law. Google’s supercomputer — or supernode — gives it a leg up on the competition. Yours, however you define it, will too.

Udell is lead analyst for the InfoWorld Test Center.