Big Data Attack Plan: Distributed Data Lakes

by Angela Guess

Mary Shacklett recently wrote in TechRepublic, “A democratization of the big data and analytics process can’t come soon enough for many organizations. This point was made clear during a talk last week with Michele Goetz, principal analyst for Forrester, and Ben Szekely, vice president and founding engineer for solutions and pre-sales at Cambridge Semantics, a provider of big data analytics tools for end users… Szekely discussed an attack plan for big data that was organized around distributed data lakes throughout the enterprise, with the various data lakes being worked by different end user departments. There is merit to the idea; when IT and/or data scientists clean and prepare data, they do the job clinically, abiding by classic data normalization and cleansing rules.”

Shacklett continues, “However, when business users with specialized expertise in sales, marketing, manufacturing, purchasing, finance, customer service, and HR get involved, they can not only check the data, but they can enrich the data further with business value that is based on their experience. ‘What we want to do is to drive transparency into the process,’ said Szekely. ‘We want to turn tribal data knowledge into an entire data asset….Companies can help to facilitate this by adopting a big data architecture where the big data sandboxes throughout the organization are turned into product zones.’ Companies like Cambridge use graph-based data discovery and analytics to create big data preparation tools that end users without an IT background can put to use. ‘The goal is to create a self-service analytics approach for end users that enables these users to visualize and discover data and to contextualize it in the business on their own,’ said Szekely.”

Data Topics

Big Data Attack Plan: Distributed Data Lakes

Leave a Reply Cancel reply