A data lake is an environment where a vast amount of data, of various types and structures, can be ingested, stored, assessed, and analyzed. Data lakes serve many purposes, including:
- An environment for data scientists to mine and analyze data.
- A central storage area for raw data, with minimal, if any transformation.
- Alternate storage for detailed historical data warehouse.
- An online archive for records.
- An environment to ingest streaming data with automated pattern identification.
Other Definitions of a Data Lake Include:
- “A collection of storage instances of various data assets additional to the originating data sources.” (Kelle O’Neal)
- “A tool that works upon different data nodes.”. (Michelle Knight)
- “A pool of unstructured and structured data, stored as-is, without a specific purpose in mind.” (Amber Lee Dennis)
- A repository “of unfiltered raw data that has not been modified at all.” (TechRepublic)
- A place where “unstructured/prestructured data resides.” (Harvard Business Review).
- An affordable way to “store big data in near limitless amount.” (Forbes)
Businesses Use Data Lakes to:
- Find and act on business opportunities.
- Stimulate innovation.
- Deal with complex and diversified data.
- Meet business demands of more insights, agility, and flexibility.
- Store different types of data in their original formats until they need to be structured and analyzed.
Image used under license from Shutterstock.com
