Computerworld

Warning: you may be an e-hoarder

Hoarding shows are popular these days. Hoarders, Hoarding: Buried Alive, Confessions: Animal Hoarding and on and on. The images are consistent: Boxes stacked to the ceilings. Piles of newspapers dating back to the Nixon era. Feral cats skittering behind furniture. Empty cans of cat food, beans and soup scattered everywhere.

Most people know a hoarder. Maybe it's an aunt. Maybe it's the neighbor with a sofa on the front porch and motorcycle parts strewn across the lawn. Or, maybe it's you. Have you taken a look at your email inbox lately? Last time I cleaned out mine, it had sprawled to more than 1,500 messages - and I hadn't neglected my inbox for all that long.

According to the Radicati Group, the typical knowledge worker sends and receives 105 emails each day. Cribbing from Shakespeare, some people are born e-hoarders, some are made, and others have e-hoarding thrust upon them.

Plenty of us have e-hoarding thrust upon us. In regulated industries, e-hoarding is more or less mandated. Delete the wrong email, and you could get your firm in serious trouble - although that doesn't mean you have to store the thing in your inbox indefinitely.

Cheap storage is a key enabler

With computers sold with ever bigger hard drives, e-hoarding doesn't stress storage the way it would have in the past. And why delete, when it may well be cheaper to store? The cost of storage has dropped from about $9 per GB in 2000 to about $.08 per GB today.

If you're a well-paid knowledge worker, the productivity lost while purging old files may well cost your organization more than the bloated storage costs. That is, until it comes time to find something. Powerful search engines like Google create the illusion that information is always at our fingertips.

RELATED: Sorting through email archiving tools

Enterprise search, though, falls way short of what we're used to with Google. Desktop search is nearly as bad, and email search is like banging flint together to make a fire instead of using a lighter. It's downright primitive.

Elliot Soloway, a professor in the College of Engineering at the University of Michigan, is a self-confessed e-hoarder. Soloway is constantly writing articles for publications, ideas for his classes and entries for his blog. He saves everything. "You never know when you might want to reuse a paragraph or rescue a nice turn of phrase you never ended up using," he says.

Anything filed in the last week or 10 days, Soloway could find. Beyond that, finding poorly filed information would often take longer than recreating it from scratch. Soloway eventually tamed his e-hoard with X1 Technology's desktop search software. X1 quickly finds information buried in emails, documents and presentations (and its most recent version will search social media sites, webmail and even search a remote PC from a smartphone).

Now, Soloway says he doesn't need to worry about e-hoarding. Anything he saves is accessible. "In fact, I save more mini-files. I create many small documents with bits and pieces intended for larger projects. I write differently because I don't worry about information getting lost as soon as I close the document," he says.

Universities thrive on unfettered access to reams of information, but most enterprises can't play as fast and loose with data sprawl.

The real cost of data sprawl

While the cost of storing data has dropped significantly, ancillary costs haven't, including data management costs and even costs associated with adding space in data centers and paying for escalating HVAC bills.

Retrieval is another problem, since even the best search tool won't necessarily find data buried in an arcane application. Take SharePoint, for instance. As more people within an organization collaborate through it, the number of documents within SharePoint can spiral out of control.

Iron Mountain wins email archiving test

"When that happens, when SharePoint becomes a de facto Enterprise Content Management system, the performance degrades. Potentially, people will stop using it," says Kelley Lynn Kassa, director of marketing communications for Datawatch, a provider of data mining solutions. "To paraphrase Yogi Berra, 'No one will go there anymore; it will be too crowded.'"

Gartner predicts that enterprise data in all forms will grow 650% in the next five years. In a survey conducted for Oracle, Unisphere Research found that in many organizations stored data is reaching or has already crossed the petabyte threshold.

According to IDC, the world's information now doubles about every year and a half. By the end of 2011, IDC estimates that we will create and replicate 1.8 zettabytes (or 1.8 trillion gigabytes) of information, enough data to fill 57.5 billion 32GB Apple iPads. You could use those 57.5 billion iPads to build a Great iPad Wall of China at twice the height of the original.

Buried alive by documents . . . and legal fees

According to Jeff Fehrman, vice president of forensics and consulting at Integreon, a provider of legal and research solutions, e-hoarding becomes an even more serious problem when your organization faces a lawsuit. "During the discovery phase, if you don't have your data properly classified and legal teams are handling a bunch of information that is not relevant to the case, you can spend millions on e-discovery," he says.

Fehrman advocates having not just data retention policies, which many organizations already have, but also data disposal policies, which most do not.

Besides legal troubles, e-hoarding is also creating huge problems for IT and even executives, problems that go well beyond the costs associated with storing and later finding all of that information. According to IBM, the result of exponential data growth is that most organizations operate with serious blind spots.

IBM found that one in three business leaders frequently make decisions based on information they either don't have or don't really trust. Shockingly, one in two business leaders admit that they don't have working access to the information they need to do their jobs.

Business leaders and knowledge workers usually know they have the data they need somewhere, but they can't put their finger on it. They don't know how to find it, and if they do find it, they're not sure how current or accurate it is.

Digging out from under data avalanche: how Trustmark National Bank tamed the data beast

"The problem as I see it is the explosion of unstructured data, or data that is not stored in a relational database," says Chris Davidson, vice president and manager of Open Systems Administration for Trustmark National Bank.

As data grows, the chore of backing up critical data becomes more costly and complex. Before Davidson modernized it, Trustmark's backup and recovery strategy was a decentralized, inefficient and largely manual process. The bank's backup solution - IBM Tivoli Storage Manager (TSM) - didn't have an intuitive reporting mechanism, so the bank's backup administrators would take the raw data produced by TSM and manually keep track of the organization's hundreds of systems and their backup status on spreadsheets.

With only a handful of servers, this approach was manageable, but as Trustmark continued to grow, IT administrators started spending as much as 40 hours per month on reporting.

Davidson eventually deployed an automated backup manager from APTARE to help tame this problem. Davidson estimates that by automating the backup and reporting process, Trustmark is now saving $18,00 per year in recovered productivity, $60,000 per year in hardware costs (through a more efficient backup architecture) and $1,500 per year in streamlined auditing.

Of course, automated backup isn't the only solution most organizations will need to tame their data problem. A range of technologies can help, including the obvious ones, such as data mining, e-discovery and data governance tools, and less obvious ones, such as data loss prevention tools.

In fact, DLP tools may be a great place to start. As DLP tools classify important data that the enterprise most protect from leakage and IP theft, anything that falls outside of that "protected" classification is a good candidate for deletion.

Based in Santa Monica, Calif., Jeff Vance is the founder of Sandstorm Media, a copywriting and content marketing firm. He regularly contributes stories about emerging technologies to this publication and many others. If you have ideas for future articles, contact him atjeff@sandstormmedia.net or http://twitter.com/JWVance.

Read more about data center in Network World's Data Center section.