The benefits of Big Data are apparent: improved analytics based on real-time input of a variety of geospatial, weather, customer, competitor, and industry-specific information. It highly influences business and operations processes in close to real time, especially with predictive analytics and operational intelligence options.
Yet, current adoption rates prove that inhibitors of Big Data initiatives may be even more daunting, and include infrastructure complexity, high costs, lengthy time to deployment, a specialized (and hard to find) knowledge and skill set, and an inflexible infrastructure that is simply beyond the reach of most organizations—particularly those that do not specialize in technology.
Cloud alternatives, especially those that involve third-party vendors, address virtually all of these issues; yet, still leave lingering security concerns and require moving potentially sensitive data beyond an organization’s firewall.
On Premise, Big Data Private Clouds
The solution? Big Data Private Clouds that are on-premise, much like those facilitated by IT heavyweights such as Google or Facebook and which are easily provisioned by BlueData’s EPIC platform, which can spur Big Data adoption rates with advantages in the following realms:
- Data Science: EPIC reduces time to deployment and the complexity of the work of Data Scientists by enabling them to create multiple node clusters in a few mouse clicks as opposed to thousands.
- Privacy: In addition to decreasing reliance on physical infrastructure and leveraging all of the other elastic, scalable benefits of the Cloud that are well suited for Big Data, organizations can continue to utilize the security of their existing enterprises by keeping their data on premise without a need to move it, and accessing it on a need to know basis through their own Clouds.
- Costs: Private Clouds enable organizations to utilize the same cluster for different departments, which reduces the sheer number of clusters and amount of infrastructure required to provision them; EPIC also forgoes costly data duplication which can have significant ramifications for storage and physical infrastructure. The product is also preconfigured to support some of the most ubiquitous Big Data platforms (Cloudera, Spark, HBase and others) which help expedite numerous application processes.
The principle boon associated with private Big Data Clouds located on premises is that they simplify the architecture for Big Data initiatives, enabling organizations to dedicate more resources to deriving benefits from this technology and less towards facilitating it. In such a way private Big Data Clouds enhance the experience of accessing and creating insight from Big Data—which can certainly help spur adoption rates.
“You can actually provision a 100 node cluster on Amazon with a few mouse clicks that takes you three months and 50,000 mouse clicks to set up on premise,” BlueData CEO and co-founder Kumar Sreekanti said. “If you really want to harness Big Data in your enterprise, you have to figure out a way to scale fast.”
Perhaps the best way of demonstrating the value of on premise, Big Data Private Cloud—and of EPIC in particular—is to hypothesize a Big Data initiative without using this product or third-party Cloud vendors. When using a bare metal infrastructure approach, such an initiative is largely restricted to departments (since different departments require different types and uses for Big Data). The most common method for setting up such an initiative would involve purchasing as many physical servers as one would require nodes on its cluster, utilizing their individual storage space, installing operating systems and Hadoop on them, and injecting the enterprise’s data from its own stores to run jobs on them.
Each department would have to do this process for its own particular use for Big Data; scalability is circumscribed by that which the physical storage or disk space can handle. Using a third-party Cloud vendor minimizes the physical hardware infrastructure but still requires moving enterprise data, relying on the computing power of that vendor, and limiting one’s Big Data access to the vendor’s technology requirements.
The true value in EPIC is realized by the fact that its virtualization platform enables users to keep their data wherever they are—whether in an object store, in Hadoop Distributed Filing System or anywhere else—and sends a virtualized version of that data to whatever Big Data systems are in use, such as Hadoop or Spark. It is compatible with any app, server, or storage system; a good deal of its speed is attributed to the fact that after installing EPIC into a bare metal environment, the resulting Big Data Cloud can be provisioned for any type of cluster due to the aforementioned preconfigured capabilities. Best of all, these clusters are extremely scalable and enable users to choose how many nodes they will be—regardless of physical infrastructure. The benefits of this type of instant scalability extend from data scientists to end users in both research and development and production environments.
EPIC principally consists of three different technologies which enable it to provide on premise, private Big Data Clouds. DataTap is the virtualization technology that also assists with the speed associated with the platform and which allows enterprises to forgo copying data into Big Data platforms…a process which can exacerbate governance issues and consume valued time. IOBoost helps to power the underlying EPIC engine and ensures that, based on the needs of the particular Big Data application an enterprise is accessing, tiering and caching is optimized to enable maximum performance of that application. ElasticPlane is the technology responsible for the scalability of EPIC. It can increase computing power and nodes per cluster.
The security in an on premise, private Cloud is more substantial than that in a public Cloud since the former is able to utilize all of an enterprise’s security measures—which is a key distinction between Big Data accessed through third party vendors and that accessed on-premises. Additional security measures facilitated by EPIC include the ability to grant user access according to cluster, which may be codified in any relevant way including by department, job position, security access, or anything else.
“Within a private cloud there is additional security,” said BlueData chief architect and co-founder Tom Phelan. “There’s the concept of need to know in which some users need to get access to some information, but they may not be allowed to get other information. Because BlueData has a tenancy model, even within the firewall all the data’s protected within the enterprise. But we can also partition it through our tenant and user model so that only those users within your organization who have the right to access certain data are permitted to do so.”
Beyond Silicon Valley
Although the architectural simplification, cost advantages and expedience associated with on premise Big Data private Clouds is uniform for organizations regardless of vertical industry, these benefits tend to become magnified for organizations that do not specialize in technology. The only way that Big Data adoption rates will increase is through a mainstream embrace of this technology outside of Silicon Valley and the Bay Area in general. The whole rationality behind many of the features associated with EPIC is that it enables even the most modest small and mid-sized business to leverage the advantages of an on premise private Cloud for Big Data in much the same way that some of the most salient Silicon Valley companies do.
“These type of technologies today require an extensive skill set and deep knowledge and resources that exist with these large companies,” explained BlueData vice-president of products Anant Chintamaneni. “We want to simplify the skill set needed to provision these type of clusters so they can give value faster. That’s why we believe a lot of the companies today whose primary business is not technology are not participating in the Big Data cycle. We want to bring this technology to them.”
EPIC will be available in a free community edition that works on a single node and in a full enterprise edition.