Advertisement

The Allure of the Cloud for Data Lakes

By on

Click to learn more about author Sanjay Vyas.

Many organizations are considering whether it’s the right time to shift from an on-premises data lake to a cloud data lake. From scalability issues to software incompatibility and security, on-premises data lakes haven’t delivered on the promise to give organizations fast and unfettered access to all their data from any source. Are cloud data lakes the answer?

For many organizations, the answer is yes. They’re frustrated by the complex environments, which require specialized skills, expensive consulting services, and time-consuming work-arounds to quickly and securely access the data they need. They’re impatient with high-latency data integration and slow response times for analytics. And many are eager to deploy artificial intelligence applications on an environment that can manage complicated deep learning algorithms.

Few organizations that have invested in on-premises data lakes are interested in “lifting and shifting” their entire environment to the cloud at this point. But many are building cloud data lakes to manage data from emerging sources and move data strategically from on-premises systems. There are several benefits driving interest in cloud data lakes:

  • Easier to manage: Cloud data lakes are easier to manage for a variety of reasons. The hardware infrastructure is managed by a public cloud vendor, offloading the need to purchase and maintain additional hardware in the data center. Meanwhile, they come with a cloud-native solution stack that integrates more seamlessly with cloud data lakes. From data integration to data visualization, tools are easier and faster to deploy and operate, requiring less specialized skills and much less custom coding.

  • Latest technology: Cloud-based infrastructure and apps always have the latest technology with maintenance and updates handled by the technology provider with minimal, if any, down time to customers’ businesses. 

  • Lower cost: The cost of managing data centers and adding additional hardware to bring in new data sources or expand to new geographies no longer makes sense. With the on-demand infrastructure of a cloud data lake, organizations pay only for the resources they use, often paying monthly and by the number or users, queries logged, or terabytes consumed. Costs become more predictable and easier to control.

  • More scalable: Though on-premises data lakes are appreciated for their ability to handle extremely large volumes of data, they require manual effort to add and configure servers as data volumes grow. Cloud data lake solutions allow organizations to increase and decrease capacity as business needs fluctuate, without purchasing, operating, and maintaining hardware internally. Scalability is further simplified with auto-scaling features that automatically adjust resources to fit pre-determined parameters to keep applications running within prescribed budgets. 

  • Faster access to data: Much of the technology stack that supports cloud data lakes is cloud-native, meaning it was designed to work within a cloud infrastructure and to support the velocity, variety, and volume of modern data. Therefore, they move and query the data much faster and more accurately than traditional tools within on-premises data lakes.

  • Built-in security: Public cloud providers have taken data privacy and security very seriously, implementing strict security credentials and complying with mandatory regulations such as financial and health care statutes.

  • Innovation: Moving the data lake to the cloud frees up the IT organization and business analysts to focus on adding value to the business. Rather than spend most of their time on upkeep and maintenance or data ingestion and preparation, people can spend time on innovation and analysis that drive business performance. 

Stepping Up to the Cloud

Moving the data lake to the cloud is not a decision to take lightly. There are many other issues to consider, including the company culture surrounding data, the need for self-service data access, and your unique needs for data protection.

The good news is that it isn’t an all-or-nothing approach. Many organizations are moving their data lakes to the cloud in phases, building a modern architecture that includes a blend of hybrid, full-cloud, and multi-cloud capabilities.

Whatever the approach, the cloud has become an undeniable influence over the ways we manage data today. The benefits offered by a cloud-based data lake are too many and too powerful to deny.

Leave a Reply