What Is a Data Lake?

A data lake is an environment where a vast amount of data, of various types and structures, can be ingested, stored, assessed, and analyzed. Data lake technologies can scale to massive volumes of data, and combining datasets is easy with data stored in a relatively raw form.

A data lake architecture can centralize data over distributed storage, providing a scalable, fast, secure, and economical solution.

Data lakes serve many purposes, including:

An environment for data scientists to mine and analyze vast amounts of raw, structured, and unstructured data
A central storage area for raw data, with minimal (if any) transformation
Alternate storage for a detailed historical data warehouse
An online archive for records
An environment to ingest streaming data with automated pattern identification

[dv-promo buttontext=’GET STARTED WITH OUR DATA ARCHITECTURE TRAINING PROGRAM’ buttonurl=’https://training.dataversity.net/learning-paths/daf0-data-architecture-fundamentals-learning-plan?utm_source=dataversity&utm_medium=inline_ad&utm_campaign=DAF_LP_temp2&utm_content=copy4′]

Other Definitions of a Data Lake Include:

“A collection of storage instances of various data assets additional to the originating data sources.” (Kelle O’Neal)
A technology that “allows raw, structured, and unstructured data to reside in one repository and enables comprehensive analysis of big and small data from a single location” (Paramita (Guha) Gosh)
“A pool of unstructured and structured data, stored as-is, without a specific purpose in mind.” (Amber Lee Dennis)
“A storage repository that holds a vast amount of raw data in its native format until it is needed for analytics applications.” (TechTarget)
A place where “unstructured/prestructured data resides.” (Harvard Business Review)
An affordable way to “store big data in near limitless amount.” (Forbes)

Data Lake Use Cases Include:

Have a data system to “support innovation and insights in health care service delivery”
Share data across discrete corporate divisions to “increase research and operational efficiency, escalate output, and accelerate drug research.”

Businesses Use Data Lakes to:

Find and act on business opportunities
Stimulate innovation
Lower infrastructure and maintenance costs
Store data on the cloud
Pipe different data from one storage area to another
Provide a central Data Management system for big data and over-distributed storage
Deal with complex and diversified data
Meet business demands of more insights, agility, and flexibility
Store different types of data in their original formats until they need to be structured and analyzed

Image used under license from Shutterstock.com

Other Definitions of a Data Lake Include:

Data Lake Use Cases Include:

Businesses Use Data Lakes to:

What Is Data Modeling?

What Is a Knowledge Graph?

What Is a Graph Database? Definition, Types, Uses

Thanks!

What Is a Data Lake?

Other Definitions of a Data Lake Include:

Data Lake Use Cases Include:

Businesses Use Data Lakes to:

Related Data Concepts

What Is Data Modeling?

What Is a Knowledge Graph?

What Is a Graph Database? Definition, Types, Uses

Lead the Data Revolution from Your Inbox.

Thanks!