Computerworld

Getting voice into the network

The computer/telephone integration vision was, and still is, that the immediacy and emotional bandwidth of speech can be woven into data network applications. Clearly that hasn't happened yet.
  • Jon Udell (Unknown Publication)
  • 02 March, 2003 22:00

Speech is the defining human trait. Shortly after a speech-enabling gene called FOXP2 became fixed in the human genome 120,000 or so years ago, we became anatomically and culturally modern.

But we postmoderns are still figuring out how to collaborate effectively using email, instant messaging and calendars. The CTI (computer/telephone integration) vision was, and still is, that the immediacy and emotional bandwidth of speech can be woven into data network applications. Clearly that hasn't happened yet. Just consider the humble conference call, which invariably began with the caveat "in case I lose you". Users abandoned that PBX feature and signed up for conference bridges.

Software control of basic telephony does not require that voice and data travel across the same wires. For years it's been possible to bridge the PBX to the LAN and leverage the strengths of both. That IT managers mostly didn't bother to do so says plainly that the benefits weren't, by themselves, compelling. Call control and screen pops may be critical infrastructure features for the call centre, but for the enterprise these nice-to-have features won't drive PBX/LAN integration, never mind wholesale adoption of VoIP. When Infoworld (US) asked 400 US IT leaders to rate telephony technologies, only 17% reported that they consider first-party call control important, and even fewer -- 7% -- mentioned third-party call control.

What will drive VoIP into the enterprise, both at the edges and in the core, is feature parity with the PSTN (public switched telephone network) plus a clear cost advantage. Neither is a slam dunk, but the rules are changing. The vaunted high quality and low latency of the PSTN, for example, does not extend to the vast number of business calls conducted on cellphones. Even as the quality of the average PSTN call heads south, VoIP calls can challenge the best the PSTN can offer. For this article, we evaluated TeleSym's SymPhone, a software phone for wireless PDAs, against a conventional voice call. The all-IP conversation through an 802.11-equipped iPaq plugged into our left ear kept pace with its PSTN counterpart plugged into our right ear -- the iPaq plus SymPhone sounded better.

For David Isenberg, an AT&T alumnus and independent telephony analyst, this result is not surprising. For years he has argued that the phone system's architecture -- a smart network with stupid devices -- will inevitably yield to the internet's inverse model of a stupid network with smart devices. As further evidence of the power of intelligence at the edge, he points to Global IP Sound, a Swedish developer of enhanced VoIP codecs.

"Their algorithm is tuned for the packet-switched environment," Isenberg says, "and it compensates for packet loss and jitter."

Innovation at the edge can deliver cost-reduction, too. VoIP may be a cost-effective way to bridge far-flung central offices, but that strategy doesn't address the SOHO (small office/home office) scenarios typical of the virtual enterprise. Emerging SOHO-grade VoIP solutions can not only cut down drastically on long-distance charges but conceivably eliminate POTS (plain old telephone service) altogether.

We have been testing Vonage's DigitalVoice on a DSL circuit in one of our remote offices. This $US40-per-month service uses a conventional phone handset, a Cisco ATA 186 adapter and a Netgear router. It can make regular phone calls and, unlike Net2Phone, also receive them. Quality varies with the internet weather.

Frankly, we wouldn't ditch our POTS line until we see more evidence of Isenberg's thesis, but the trend is encouraging. Almost a quarter of the survey respondents plan to use VoIP at the edge -- 14% citing voice in favour of DSL and 10% citing voice in favour of cable.

We are certain that the internet model -- abundant, general-purpose bandwidth managed by intelligent endpoints -- will prevail. It's unlikely that 2003 will be the Year of CTI that some expected in 1995, but we're seeing notable innovations. Using the Vonage product, we can visit a website (which should, and easily could, offer a set of SOAP-callable web services) to forward calls to another phone, check voice mail and review call logs. And during a PDA-to-PDA call with SymPhone's product manager, we yanked out the wireless card, stuck it back in and were able to continue with the call. With either gadget, your phone number and services travel with you to any internet location. These immediate benefits only scratch the surface of what voice/data integration could mean.

Two general kinds of deeper integration are possible. In the realm of signalling and call control, SIP (session initiation protocol) can be used to weave IM-style presence into voice conversations, or conversely to inject telephone presence into data connections.

Digitised voice presents a different, and potentially vast, opportunity. Email dominates business communication because it's cheap and the text data can be randomly accessed, indexed, and searched. Voice data, linear and opaque, is far less useful. Speech-to-text translation systems are improving and can produce results that are searchable even when not usefully readable -- but not in real time.

Moore's Law will eventually get us there, but the brute-force approach won't yield orders-of-magnitude improvement overnight. For that, you need a different algorithm altogether, and Fast-Talk Communications has one. Its technology for indexing and searching voice data works directly with phonemes. In a speaker-independent but language-dependent manner, Fast-Talk's engine rips through conversations in real time, indexing not the byte offsets of words and phrases, but the time codes associated with sounds. Speech-to-text translation produces a string of phonemes; the engine finds occurrences and returns time codes. If a transcript exists, it can be fed into the engine a line at a time to enable users of conventional search engines to randomly access the voice data.

According to the telephony survey, the top business drivers of telephony applications are CRM, at 36% of respondents; and KM (knowledge management), at 33% of respondents. Although bottom-line savings are the obvious rationale for VoIP, IT decision makers clearly think voice/data convergence can help push top-line growth, too.

Proving ground for telephony: small business, consumers

-- Tom Yager

First-generation products that debut in the large corporate market get better and cheaper after they're adapted for consumers and small businesses. Nowhere is that truer than in telephony. What we consider basic business services -- think of Caller ID, digital voice mail and memory dialling -- started out as fragile and costly PBX options. They became standard enterprise features only after they were built into small business key systems and $US50 residential phones.

Applications for wireless, IP telephony, voice/data integration, web call centres, speech synthesis and voice recognition are taking shape in residential and small business markets. Unless you are in dire need, it's wise to let these technologies take their lumps from demanding, price-sensitive customers before incorporating them into your infrastructure.

But respondents to the 2002 InfoWorld Telephony Survey indicate that they're not watching small-market developments very closely. At 14% and 10%, respectively, interest in VoDSL (voice over DSL) and VoCable (voice over cable) is miniscule, which is understandable if you view these technologies in isolation. A CTO isn't going to migrate inbound circuits from a T1 to a cable modem.

But VoDSL and VoCable are massive testbeds for capabilities that interest IT managers. They represent the largest installed base of IP telephony users, which turned cable operators into local phone carriers. Service is delivered to subscribers as analog phone lines. One broadband service, AT&T Broadband, gives users as many as seven lines that accept incoming calls. Survey respondents cite integration, reliability, quality, and bandwidth concerns as barriers to adopting IP telephony, but these problems can clearly be overcome even in networks with thousands of unsophisticated users.

Interactive voice response systems are now affordable if caller input can be taken as touch tones. But businesses need affordable speaker-independent speech recognition and realistic speech synthesis.

Apple, Microsoft and AT&T Labs have tweaked the familiar phoneme-based speech synthesis engine to improve clarity, reduce noise and make intonation and cadence sound more human. These mass-market applications will knock the kinks out of speech technology in a hurry.

Sidebar

SIP is sneaking into the enterprise