Computerworld

Down coring an important innovation by AMD

The chipmaker has made a vital stride, says Tom Yager
  • Tom Yager (Unknown Publication)
  • 14 December, 2008 22:00

The 2008 MacBook Pro sitting in my lap is my favourite currently-available commercial notebook, but if you asked me what my all-time favourite is, I'd have to say it's Apple's PowerBook. I could run it for a solid six hours on one charge, something that no modern notebook with a single battery can do, even though it had an old-fashioned backlight and a GPU that, unlike MacBook Pro's, could not downshift to a low-power mode. What made PowerBook such a marvel was that Apple brought embedded system principles to its laptop design. From the single core, 32-bit 1.6GHz PowerPC to Apple's custom power-pinching custom silicon, PowerBook was made to run forever on a charge. By PC standards, a 1.6GHz clock was dog slow, but Apple believed that CPU speed was irrelevant as long as the user experience was satisfying. Apple was right. The Mac and OS X were designed to handle events very rapidly, so much so that users perceived little lag, if any, between action and reaction in the GUI where disk access was not involved. Apple dumped the PowerPC, but slow 32-bit CPUs are more popular and pervasive than ever. They're in cars, aircraft, medical equipment, DVD players, Fibre Channel routers and satellite receivers. A 32-bit CPU running Linux powers my HDTV, and I needn't mention that slow, cool-running microcontrollers (microprocessors with integrated peripherals) are in every cell and smartphone in use. The prize characteristic of these little CPUs is their minuscule latency. They're relatively pathetic at general-purpose computing — they are not number crunchers. Their specialty is to react so rapidly to a great many stimuli that it appears to be a zero-latency system. Microcontrollers watch sensors, paint the screen, sift through the binary chatter of multiple radios, encode and decode voice data in real time, stream high-bit-rate media sources, manage storage... It seems like a lot to ask of a 400MHz CPU powered by a featherweight battery, but a microcontroller is designed to excel at two things: sprint and sleep. This characteristic distinguishes embedded and mobile systems from general-purpose computers. Your smartphone reacts. Your computer computes. If this is so, then what does a server do? For certain workloads, I submit that a server works better based on a low-latency microcontroller model than a high-performance supercomputer model. For example, an edge server needs to filter and direct network packets at wire speed. This is not compute-intensive work, as evidenced by the fact that a black box firewall/router can run on a 32-bit microcontroller with a DC power supply. However, because servers and server OSes are poorly designed for this work, the CPU load seems to indicate that more systems, or more powerful systems, must be thrown at such duties. One IT guy I chatted up at a conference ran several racks of systems that served dynamic web pages to external users. That operation had gone fully virtual and reaped enormous successes. I asked him what criteria triggered a scale-up. "Latency," he said. Interestingly, aiming more resources at one latency-sensitive task lowers overall headroom, and I'll wager that once a relationship between latency and equipment was established, budgets for equipment started climbing precipitously. We are almost heading the right way. AMD has introduced a concept for which I've long awaited: Down Coring. This involves the powering down of cores in a multicore system. At present, this requires a BIOS settings change and a reboot, and I've only seen it implemented in client machines so far. In a big 16-way Shanghai box, how much down coring support will be supplied? If you fall back to one core, what of the others? I submit that they should be fully halted, but at present, each socket must have at least one core running to keep memory accessible. With a sufficiently smart OS, all cores but the one to which the South Bridge I/O controller is connected could go off-line. The OS would have to unmap memory assigned to powered-down cores, but the process differs little from that required to prepare migration to a dissimilarly configured server. This can be done on an even grander, or rather smaller, scale. As an exercise, I turned a machine with a 16MHz 8-bit CPU into a simple internet whitelist mail manager. After about 15 minutes with no whitelisted mail, the mail servers proper were powered down. It was not until a whitelisted request arrived that the rough equivalent of one server core was awakened, at which point the whitelisted mail was routed as normal. I resorted to Windows' core affinity to make this happen, so power savings weren't significant, but the drop in server CPU utilization and heat were dramatic. It was only a proof of concept because the server took so long to power up that subsequent messages would be missed, and delivering only whitelisted mail is not workable in production. Still, it's an extreme, yet workable example of the power of downscaling. If a battery-powered 8-bit single board computer can do this, it's easy to imagine something with the brawn of a smartphone doing far more, even running blacklist and DNS checks on e-mail and serving static "Please wait..." Web pages (see my Green Delay post) while it waits for the server. There are several server tasks that are event driven, rather than compute driven. When a current PC server spends most of its time waiting for work to do, it is falling into event-driven mode, and yet all of its CPU sockets are powered up and all memory, even unused memory, draws power. PCI Express and USB peripherals can be ordered into a power-saving state, but as a rule this is not done with servers. An AMD server can effectively be placed in low-power event mode by suspending all non-essential processes. Eventually all of these suspended processes and their RAM will be paged out to disk, providing an opportunity to defragment storage down to that which is attached to the minimal number of cores. You'd find that there are more of those useful quiet times than you expect. I do know that it takes less than a second to go from this event-only mode to one kicking on all cylinders. Virtualisation's potential role in this is unclear. I know that it can do a real-time migration to a larger server, but can it squeeze into a smaller one? This concept runs contrary to the current enterprise mentality that keeps unused servers ever-ready to leap into action, and considers an idle virtual machine to be an overall energy win. No, to get that, you need to strip away real resources, not virtual ones. The embedded world offers many lessons for green server designers.