How the Queensland Brain Institute deals with its data deluge

The QBI’s IT manager is trying to avoid storage headaches

Storage can be a headache for any organisation that relies heavily on IT. But Jake Carroll, the senior IT manager (research) at the Queensland Brain Institute (QBI), doesn’t just have to contend with the typical storage needs for a 500-person organisation.

The QBI IT manager needs to worry about whether his flash storage is going to burn out because of the volume of IOPS being thrown at it, or if his storage network is about to be hit by a tidal wave of data from a scientific instrument that may not have existed six months ago.

But the problems Carroll has to grapple with aren’t too much of a surprise given his employer: The QBI is a world-class neuroscience research institute, housed at the University of Queensland.

The institute was established in 2003 and houses more than 450 researchers as well as some 33 support staff.

“We are all about aging and dementia,” Carroll said. “We are trying to look at and investigate ways of stemming the tide of what is effectively being considered a tsunami of disease – aging, dementia, Alzheimer’s.”

“The core goals of the organisation are to find ways to help people into a better quality of life when these debilitative disorders impact them,” he said. “It is a neuroscience institute like no other – it’s one of the largest neuroscience institutes in the world, and it has a population of people drawn from all over the world.”

Underpinning the work of the QBI’s three dozen research groups, Carroll said, are extremely high-throughput computing and storage technologies — the responsibility of his compact team of five IT staff. Carroll noted that the role of the technology team is, in many respects, a million miles from that of a classic IT department.

“If a printer breaks I’ll hear about it and I’ll give someone a hand to fix it,” he said. “But the next day I might be finding ways to stop holes being burned in pieces of flash storage because we bombarded it too hard. Or finding ways to eke that little bit more performance out of a tape drive technology that only just got to market — or might be in beta even.”

“We play with a lot of not-yet-to-market technology; we play with a lot of beta and alpha hardware from companies,” Carroll said.

The drive for performance and the demands of the data-heavy research efforts at the QBI means that Carroll manages what he described as “a storage and compute zoo”.

“Just locally in the building there’s about 1500 CPU cores of Intel Xeon E5,” he said. “We have some very unusual GU configurations using NVIDIA’s latest GPU technologies in the form of the Pascal P100.”

And as you would expect from an institute of the type and scale of the QBI, the data involves really does warrant being described as “big”: There’s around 8.5 petabytes of raw, unstructured data under management at the institute.

To store it Carroll has “one of every animal of arrays.”

Oracle HSM, Hitachi Accelerated Flash, Hitachi’s VSP G series arrays, fibre channel, InfiniBand and Ethernet fabrics all help deliver the storage needs of the researchers.

Research techniques such as confocal microscopy can quickly eat up gigabytes and gigabytes of space. In general, imaging is the biggest driver of data growth.

“We’re talking resolutions of images in whole brain imaging workloads of 140,000 pixels by 70,000 pixels per time,” Carroll explained. “It gives you some kind of idea of the gigabytes and gigabytes and gigabytes per slice or per stack.”

The environment is one of continuous deployment of new hardware to keep abreast of the needs of the researchers.

In contrast to a typical enterprise environment where an approximate forecast of storage needs over the next few months or year can at least be attempted, the QBI is a lot less predictable.

“With research and supercompute, at the coalface of research intensive organisations, it’s a little bit difficult — sometimes you don’t know what’s coming down the pipeline in terms of the scientific instrumentation that is going to be built,” Carroll explained.

“You might hear that in a week somebody is going to add another 50 megapixel sCMOS camera or something like that. Or a microscope that will double its outputs or triple its outputs — and three months before that, the CCD technology or the CMOS technology to do it didn’t exist because they couldn’t get the noise ratio right.

“All of a sudden it exists and you have this wall of data coming at you. It’s driven by the research.”

Carroll has a key goal: Keep the institute’s networks “friction free” no matter what.

“That’s the game — you always stay in front so that you never get to a point where people start to say ‘Look, my research is slowing down.’ That’s not the business we’re in. We’re in the business of making sure that it stays fast.”

In late 2016 those efforts saw him roll out Brocade’s Gen 6 Fibre Channel storage networking technology, installing G620 switches, which were released last year by the networking vendor. The new switches are operating alongside Brocade 6510 Gen 5 Fibre Channel switches for less demanding workloads.

“I’m a firm believer in fibre channel technology,” Carroll said. “It’s treated me very well for a very long time. I use all kinds of technology: I use InfiniBand, I use Ethernet — I use all kinds of transport technologies to get to storage. For my most critical workloads where latency and assurance of data integrity are absolutely instrumental, fibre channel is my go-to.”

The G620s switches were rolled out over a day or two towards the end of last year. The switches are now delivering a fully redundant, low latency storage network fabric with 32Gpbs links for the QBI.

“I could see a very clear path of NVMe flash storage latency coming down the pipeline at me,” Carroll said. “I could see that there was going to be an aggressive play in the very high throughput, very low latency flash space.

“And I could already see that connecting those networks over anything less than a 16-gig or a 32-gig fibre channel fabric or an EDR InfiniBand fabric was probably going to deliver pretty lacklustre results given the tolerances that exist in technologies like NVMe or even Intel’s [Optane].”

“I kind of thought: ‘You know what, let’s make sure we rig this thing up for the next five to seven years and let the flash storage stretch its legs.’

“Don’t put capital investment or operational expenditure into devices which you are only going to hamstring with your fabric. Because we know what they can do – we’ve seen it in the lab. It’s very important you don’t effectively waste your investment.”