Some of the latest buzz in the big data world is centered around the complexity of on-premises, Hadoop technology implementation, and so enterprises are getting restless and moving to the cloud. In an increasingly cloud-friendly business landscape, serverless computing may hold more promise than retrofitting Hadoop. More and more global businesses are moving to the cloud, and a strong, viable option is implementing a serverless architecture for distributed computing.
However, the equally strong concerns about data security and data privacy in a serverless environment will make businesses pause and rethink their IT infrastructure strategies. Can serverless computing maintain the stringent Data Governance policies required by business infrastructures?
WANT TO IMPROVE YOUR ORGANIZATION’S DATA QUALITY?
Learn how to get started and leverage a multitude of Data Quality principles and practices with our online courses.
Hadoop and Complexity in the Cloud
Hadoop, according to some experts, has a “cloudy future,” as businesses are increasingly moving to cloud. Since Hadoop was not traditionally designed for the cloud, Hadoop vendors are struggling with object stores and abstract services to make their solutions work on the cloud. On the other hand, the on-premise Hadoop implementations are causing headaches with apparent and often well-documented operational complexities. Moving to public cloud does not only mean a revenue shift from on-premises to cloud for Hadoop vendors, but also wrestling to make Hadoop “relevant” for the newer AI technologies like deep learning (DL) matrix operations.
A KD Nugget article confirms that as data breaches have become commonplace and the European Union passed the EU General Data Protection Regulation (GDPR) Law, Data Governance will take center stage in global businesses. With this emerging trend, Data Management is becoming more complicated with the co-emergence of newer AI (machine learning and deep learning) technologies, all of which play significant roles in today’s business data processing.
So where do business owners and operators, who are already overstressed with growing demands of a complex IT infrastructure, go for quick and painless solutions?
The Forrester post The Cloud Is Disrupting Hadoop gives a clear understanding of where Hadoop vendors stand in the global market in terms of shifting revenues and incompatible AI technologies. Hadoop vendors must promise unique benefits to its shifting user base to stop them from moving to other choices like the Serverless Computing landscape.
Data Governance on the Cloud: A Nagging Issue
The problem with cloud computing is that it traditionally has many more data access control, data auditing, or security issues than on-premise data storage facilities. So, where does that leave the business owner or operator relying heavily on outsourced, IT Infrastructure Management services? Back to square one!
The business operator subscribing to cloud services will still have to implement their own Data Governance or data security practices like it is done on-premise. The Forbes post Data Protection and The Cloud: A Hybrid World Deserves Hybrid Security explains this well, and establishes the importance of a unified, security platform that spans on-premise and Cloud data stores.
In the article 5 Cloud Trends to Watch in 2020, the author indicates that as cloud storage services come under new Data Governance regulations, especially in the wake of GDPR, cloud storage vendors will probably shift to a business model of supporting larger numbers of small data centers instead of supporting fewer, large data centers. One of the biggest advantages of cloud storage has been its pay-per-use consumption model.
Partial Solution: Automating Data Governance on the Cloud Platform
If you take the case of Amazon AWS, you will see that special configuration rules are used to automatically check that all data volumes are encrypted. The author of the Amazon post Automating Governance on AWS suggests that in-built config rules may be combined with custom rules to manage data volumes. This post stresses the fact that with AWS, system administrators do not need to memorize every control, as security administration features and near real-time dashboards reveal all security risks and status.
So, is Serverless a Logical Progression from the Cloud?
For moving big data to the cloud, the storage architecture options have always been a critical concern to businesses worried about Data Governance, data security, and consumer privacy issues. The IT industry solution providers have worked out a path for using big data on public cloud via Insight PaaS, as big data on public cloud has suddenly spiked global spending on hosted infrastructure compared to on-premise. The Forrester blog post explains it so well. Among the eight solution vendors featured on this survey, the Forrester expert recommends Insight PaaS over other options.
Now, this is where serverless computing comes into the picture.
Serverless architectures ensure that enough servers and storage spaces are reserved on third-party platforms to support all applications, so that business operators do not have to maintain complicated and costly, on-premise IT infrastructures for their business needs. Sumo Logic recently surveyed 1,500 cloud service users, who use platforms like Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP).
These infrastructure platforms treat their hosted resources as a virtual data center. The provisioning of virtual IT infrastructure saves businesses operating budget, time, while enhancing efficiency. However, Data Governance is a continuing priority among other issues like databases and data security.
The article How Serverless Changes Cloud Computing indicates that Platform-as-a-Service (PaaS) Cloud, as opposed to Infrastructure-as-a-Service (IaaS) Cloud, was the original inspiration for serverless computing, which capitalizes on a function-oriented service platform, popularized by AWS Lambda or Microsoft Functions. The provisioning of a “collection of paid services” takes focus away from IaaS, where dedicated servers and storage must be allocated for outsourced business operations.
The DATAVERSITY® article Serverless Computing and Serverless Architecture: An Overview of BaaS, FaaS, and PaaS reveals the two striking features of serverless technology – the move away from on-premise data storage to hosted cloud storage and a higher reliance on third-party services, which is the hallmark of both Backend-as-a-Service (BaaS) and Function-as-a-Service (FaaS). Additionally, Platform-as-a-Service (PaaS) enables third-party hosts to provide everything from server space to the deployment of applications.
The Rise of Serverless Computing: The Thorns in the Bed of Roses
To put it simply, serverless computing takes away all business operations out of the premise to the cloud environment through different operating models such as the BaaS, PaaS, and FaaS. Though modern businesses are opting for advanced computational services that do not depend on dedicated storage or server facilities (Serverless), this system of computing is gaining traction among traditional IT management operations such as taking data backups or delivering micro services.
The ground reality is uncovered in the article The Rise of Serverless Computing: Operational, Security & Financial Considerations, which graphically depicts the “esoteric” nature of serverless deployment by different cloud providers. It is true that in such a scenario, business users do not have to deal with disk space management, network issues, underlying OS, application code, patches.
Many security companies are now using serverless architectures without having to worry about perimeter security and access controls but having the power to deploy traditional security applications. Does that sound too good to be true? Moreover, DevOps developers will increasingly adopt serverless computing because this unique environment will enable them to take full control of the application pipeline and incident management. While the apparent benefits of serverless are huge, some fundamental changes must take place within the operations – things like security monitoring, financial management, and code deployment.
The article Serverless Computing has Landed: How IT Ops can Adapt is all about managing Serverless functions, performance of functions, and native DG monitoring tools, which limits the scope of external governance mechanisms. In a recent CIO article the author says that New York Times CTO Nick Rockwell is convinced the function-oriented service model of Serverless represents the next big phase of the cloud journey, leaving the developers free to focus on code development.
What About Data Governance in the Serverless World?
The guide 12 Step Guide for Data Governance in a Cloud-First World offers valuable advice on data risk management via solid Data Governance policies. Unless enterprises first take control of their Data Governance framework, they may fail to reap the full benefits of data-driven decision making. As data continues to move back and forth between on-premise and cloud, across applications, and over networks, the data “security context” must be kept in sight. This awareness translates to predetermined data lifecycles, strong data integration policies, monitoring of stale data, and strict data access control. Also, preservation of all original data is critical to businesses, in case there is a future need to revisit that data in the original form.
Photo Credit: Shutterstock.com