Advertisement

The Importance of Data Lake Governance

By on

lakeby Angela Guess

Nicole Laskowski recently wrote in SearchCIO, “When Steve Cretney took a hard look at storage numbers, he noticed something that helped upend the IT strategy at Colony Brands Inc. ‘We observed, almost naively, that we have a couple of hundred terabytes of storage in our [storage area network (SAN)],’ said Cretney, CIO at the mail-order and electronic retailer. The bulk of that was from operational systems, some cherry-picked for analysis, but the majority packed away in cold storage where it sat idle. By comparison, Colony Brand’s data warehouse contained just 10 to 15 terabytes of data, which was used for specific business analytics and reporting. The discrepancy between the two got Cretney and his team thinking: What might the data science team uncover if it had access to the data stuck in the SAN?”

She goes on, “To make cold storage data available and to push the company in a cloud-first direction, Cretney, a big believer in cloud computing before he came to Colony Brands three years ago, turned to Amazon S3, a data storage service, and Amazon Redshift, a cluster database that will replace the company’s data warehouse. His plan, set in stages with the first to be completed in April, is to build a data lake, making more data more accessible for more analytics. Data lakes or data hubs — storage repositories and processing systems that can ingest data without compromising the data structure — have become synonymous with modern data architecture and big data management. The upside to the data lake is that it doesn’t require a rigid schema or manipulation of the data to ingest it, making it easy for businesses to collect data of all shapes and sizes.”

Read more here.

Photo credit: Flickr

Leave a Reply