Denial, It Ain’t Just a River in Egypt…Especially, When It Comes to Dark Data

Click to learn more about author Bill Tolson.

I remember being a kid, with the start of the new school year looming. Even though I actually liked school, I was never really ready to relinquish the lazy hazy days of summer. Instead, my theory was that if I just didn’t think about it, sandlot baseball, swims at the local lake, and virtually no bedtime would continue forever.

I have seen a similar strategy applied in countless IT organizations, when it comes to the management of grey data. With of course the same level of effectiveness, much to the dismay of my boyhood self and today’s IT managers.

80%+ Unstructured Data

Today’s typical business, non-profit and/or government organization’s electronic data consists of unstructured data (80%+) made up of PSTs, employee work files, employee desktop backups, system generated files, multiple versions, etc., and to a lesser extent – corporate legal department eDiscovery results sets. Depending on the organization, a decent percentage of this data could also be attributed to departed employees where their accounts, file shares, and even email accounts still exist and remain within the system. In most cases, all of this data is completely unmanaged, or left to employees to manage – which again means, almost completely unmanaged (because pack-ratting almost everything you have worked on, every version, since you started with the company does not constitute “management.”).

This unmanaged data is referred to as dark or grey data, and not surprisingly, every organization is affected by it, whether they know it or not.

If We Don’t Know It’s There, Its Not…Right?

Funny story, several years ago I visited a medium-sized multinational to talk about data archiving. They told me that they didn’t think they needed to archive because they had a handle on their grey data. I was expecting them to politely ask me to leave, but then I asked the CIO a simple question; how much of your file share capacity was taken up with PSTs – old PSTs are a symptom of grey data? The CIO looked at the Director of Storage and raised his eyebrows (signaling him to answer). The Director of Storage shrugged and said, “I don’t know, but I’m sure it’s not much” – he didn’t know, what he didn’t know.

I turned to the CIO and suggested they check during lunch (assuming I would still be there to eat lunch). The Director of Storage quickly left the room while lunch was brought in. I saw him glance at the food and he looked back at me with a frustrated expression (p.s., I may have burned a bridge there).

While lunch was concluding, the Director of Storage returned (looking hungry) and sat down. The CIO asked him what he had found… The Director looked around the room and settled on me last. With a vein throbbing on his forehead he said he found that 63% of their file share capacity was consumed with PSTs (tens of terabytes). Much of the PSTs had been there for many years – a prime example of grey data. No one knew the files existed, they weren’t being managed, and they were consuming expensive enterprise storage.

As the conversation continued, I was wondering how (really, if) their regulations compliance or legal department handled these unknown grey data files, such as when responding to a discovery order, but that’s another story…

Shining a Light on Dark Data

This is/was of course not unique to the above described organization. Even today, its really more the norm than the exception. But, what to do? First, admit you may have a problem…

Once the dark or grey data is uncovered, many companies typically address it in one of two ways – delete everything or call in consultants to determine what should be disposed of – in reality, sifting through terabytes of files at roughly $200 to $300 per hour.

However, there is a better solution. It is to utilize a system that allows you to quickly and inexpensively determine what is disposable and what should be kept and managed for business, legal and/or regulatory reasons. An information management system that allows you to consolidate all unstructured data and fully index it so you can search for files by the criteria that matters to your organization – such as, last date accessed, keyword, custodian, etc. This provides the opportunity to cull obvious dark data quickly. For example, searching for files with an owner not in Active Directory would help you cull departed employee data. This solution sure beats hiring consultants or the risks associated with mass deletion.

The true key here is to consolidate and then index the potentially dark/grey data into an intelligent information management/archiving platform that allows you to search for and work with all the data using the same search engine.

But, not all data is created equal. Once the unnecessary data is eliminated, there still remains the task of deciding what to do with less critical data that needs to be kept, protected and available – but certainly not on expensive onsite assets that need to be managed by your valuable IT professional team.

The Cloud – Shining a Light on Dark Data

Public Clouds can offer an ideal solution – and the Microsoft Azure Cloud has proven itself to be among the best. One key benefit of Azure is that you are provided the flexibility and freedom to enhance it with complementary solutions that enable you to further ease management and increase capabilities, while lowering costs even more. In fact, you can even utilize your own Azure tenancy which is critical as it ensures the lowest possible public cloud pricing, and increased security (your encryption keys) – all on a future-proof platform with which you can grow and evolve.

BECOME A DATAVERSITY INSIDER FOR ACCESS TO 160+ COURSES

Data Topics

Leave a Reply Cancel reply