Shining a Light on Dark Data
You might be surprised to learn how many of your colleagues are afraid of the dark.
Dark data, that is. Like most things with “data” in the title these days, “dark data” is an offshoot of “big data.” The term basically refers to what’s left when you have your big data, and then you take all the useful data out. Or, according to Gartner, which came up with the term, dark data is the “information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing),” the company writes. “Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.”
And like the stuff in Fibber’s closet, it just keeps piling up. According to IDC, up to 90 percent of big data could be dark data.
Just like the neatest of houses probably still has a junk drawer, even organizations with pristine governance, risk management, and compliance (GRC) policies probably have dark data, writes Fred A. Pulzello, president of records management organization ARMA International, in TechTarget. “Dark data is a hot topic in information governance circles because of the relentless, immeasurable increases in electronically stored information (ESI) and the places to store it, such as on desktops, shared servers, flash drives, smartphones, tablets and the cloud,” he writes. “The quantity of ESI and its scattered nature contribute to the near inevitability of dark data accumulating in an organization.”
So what’s the problem? There are several:
Storage and maintenance costs. Unfortunately, you can’t just tell the computer, “Delete everything that isn’t useful.” For that matter, dark data may actually be useful in the future—it just isn’t useful right now. So all that data continues being backed up, being archived, shuffled through storage tiers, and so on. You probably have terabytes of the stuff that have been carefully saved over the decades.
Security. Just because you don’t find dark data useful doesn’t mean somebody else can’t. If the data gets stolen, you’re still liable for it, and you have to go through the same sorts of security procedures as for any personally identifiable data—alerting the victims, giving them identity theft protection, and so on.
Legal discovery. Should your company become involved in a lawsuit in some way, dark data still needs to be gone through to see whether it contains any relevant information. This process is time-consuming and expensive—and could open your company up to liability if something’s in there you forgot about.
“Dark data is data stored in a repository or data store that you know very little about,” Derek Gascon, executive director of the Compliance, Governance and Oversight Council, tells TechTarget. “If litigation or an audit comes from a regulatory body, you don’t necessarily know what’s in your data store, and it’s not being effectively protected.”
That doesn’t mean dark data is all bad. In the same way that a flower is a weed if it’s growing someplace it doesn’t belong, dark data is just big data without a purpose yet. In fact, as many as a third of organizations that start implementing data analytics are doing so to attempt to capture the value in dark data, according to the AIIM study, Content Analytics: automating processes and extracting knowledge.
“Yesterday’s dark data may become a shining source of insight, thanks to new tools or analytic techniques,” agrees Ed Tittel in CIO. “Somebody needs to keep an eye on such things and be ready to put them to work when the benefits of their use outweighs their costs.”
Pulzello offers a six-step process for dealing with dark data:
- Identify it. For example, was it generated by people or a machine? AIIM recommends adding metadata to the dark data to help add value to it, while Tittel recommends encrypting it so it can’t be hacked into as easily.
- Do a cost-benefit analysis. As anyone with a Depression-era grandparent knows, “I might need it someday” isn’t a good enough reason. It should be kept for a defined purpose and for a defined time of six to nine months, Pulzello writes.
- Make the business case for what to keep. And then plan to get rid of the rest.
- Make sure dark data is on your regular disposition schedule. “The organization might have a policy on deleting drafts of documents, but that policy statement orretention schedule might not address the issue of systems’ audit logs,” Pulzello writes.
- Actually delete the data. Don’t let it start another pile somewhere. And be sure to delete it properly, so it can’t be retrieved.
- Lather, rinse, repeat. At the end of the defined time period, go back to the saved data and perform the same process again—along with any other data that’s accumulated or been discovered since then.
But what if you’re missing something? Not to worry. “Unless you, the business user, have an idea of what you want to ask of this dark data, there is no point worrying about it,” reassures Andrew White, research vice president and agenda manager for MDM and Analytics at Gartner. “Good ideas don’t just come out of the woodwork, or spring forth from a data mart. Business people have to have an idea, a question, an argument to test, a theory to explore, a posit to push against. Without this, all that dark data reminds of that extra furniture we all have in that dark cupboard at the top of the stairs.”
We hope you feel more enlightened now. After all, we wouldn’t want you kept in the dark.