No Relief with Hadoop – Managing The Big Data Reality Gap

Click here to learn more about author Jon Bock.

There has been much anticipation that businesses would find relief for their analytics headaches in Hadoop, the open source software for distributed processing and distributed storage of large data sets across clusters of commodity or cloud hardware. There is no doubt Hadoop systems can handle large volumes of unstructured and structured data that would overload traditional data warehouses, but according to a recent Gatepoint survey, Hadoop is barely making a dent.

Just 11% of those surveyed are currently engaged with Hadoop. While the open source software itself is “free,” widely reported complexities and costs of implementation make it difficult to achieve business value. What was arguably one of the most hyped technologies in recent years has given way to the reality that Hadoop is hard to use—resulting in a costly skills shortage—and not well-suited to the needs of many businesses.

Given the low rate of use, many may still be learning or unaware of the benefits and limitations, with the majority indicating they are indifferent to either. A substantial minority (40%) of IT executives do perceive Hadoop to have a strong performance story, but 28% indicate that they view the scarcity of skilled professionals as a negative

Hadoop requires retraining and retooling if users want to scale it to support use beyond sophisticated data science programmers. Its complexity also requires operations teams to keep it up and running. As a result, users must understand the cost and effort required to decide if that is a good investment.

When companies try to transition from experimentation to deployment, they often discover that Hadoop is ill-suited for many applications they would like to apply it to, with many finding they are trapped into investing more and more time and resources into trying to force-fit the technology into something for which it was not designed.

What IT is seeking

They may not like what they have, but IT leaders know what they want from data analytics tools. Getting new data to analysts faster is the top (58%) most critical requirement of data analytics, followed by a simplified data pipeline (53%) and reduced overhead costs (51%). Half also want to be able to consolidate data silos and data marts.

In the drive to enable greater business agility, organizations increasingly are seeking to provide data professionals with analytics self-service capabilities, but traditional/legacy data platforms are not meeting today’s data requirements. Data comes from everywhere—not just enterprise applications, but also websites, log files, social media, sensors, web services and more. Organizations want to make that data available to all of their analysts as quickly as possible, not limit access to only a few highly skilled data scientists.

With businesses intently focused on agility and flexibility, it probably is not surprising that 70% of the Gatepoint survey respondents say scalability is a top consideration in selection of data analytics solutions, and one-third indicate that the ability to scale up or down within minute would be appealing. Furthermore, 68% indicate that availability and resiliency are very important criteria for their analytics investments.

As more and more enterprises have investigated “big data” innovation, only a select ‘elite 1%’ have been able to build complex systems and employ highly specialized teams to work with them. The mass majority, though, face near-insurmountable obstacles to find a viable option. This “data divide” is providing major competitive advantages to a handful of the most sophisticated companies with access to the right re- sources and talent needed to handle this complexity.

Cloud is fast becoming a staple in the IT arsenal and with many aspects of the enterprise computing environment migrating to on-demand services, it’s time to take a look at data warehouse as a service. As with many other aspects of IT infrastructure, cloud-based services can ease data warehouse management issues, providing expertise and resources required to scale up enterprise data warehouses.

BECOME A DATAVERSITY INSIDER FOR ACCESS TO 160+ COURSES

Data Topics

Leave a Reply Cancel reply