Hadoop Embraces the Internet of Things

By on

smart home x300by Jelani Harper

The Internet of Things (IoT) exists all around us, inside and outside of our homes, offices, cities and vehicles and includes, in addition to traditional smartphones:

  • Smart Cities/Smart Streets: There are cities with streets that utilize smart LED lighting that reduce costs and improve safety by detecting traffic and increasing luminosity when pedestrians or motorists are present, and reducing it when they are not.
  • Smart Appliances: The number of smart appliances increases daily and includes refrigerators that adjust temperatures according to tags on food items, alarms systems that can be activated and deactivated remotely, and interactive thermostats.
  • Smart Vehicle Management: Cars and trucks are equipped with sensors that provide a host of telemetry data that monitors driving habits and patterns and can be used for more efficient routing purposes, to reduce idling time, and to increase overall efficiency while reducing expenses.

The combination of these personal applications of the IoT, in addition to traditional enterprise uses of it which are generally stratified according to Operational Intelligence (for industrial equipment), recommender engines, risk calculation, and marketing purposes have pushed Big Data well beyond social media and sentiment analysis phase, and resonated at the core of their true potential—connecting machines and people.

The result is a strain on conventional Big Data infrastructure required to leverage this technology, and the necessity to update one of the most frequently used Big Data platforms—Apache Hadoop.

“The Internet of Things is going to dwarf the amount of data sources,” said MapR’s chief marketing officer Jack Norris. “Generally what we’re looking at with Big Data with Hadoop is not processing this so we can house more data for analyst end users to use more query tools. Really, one of the uses of it is to do machine generated responses to machine generated content.”

Revitalizing Hadoop

MapR is responsible for providing a number of updates to Hadoop to keep up with the more celeritous, extremely scalable needs of what amounts to the IoT’s largely sensor-derived machine data—the vast majority of which is continuously generating content. Faced with a situation in which there are a growing number of vendors offering both real time and predictive analytics as a service for customers (and in some instances modifying Hadoop to do so), Hadoop has endured a plethora of recent modifications for enterprises to utilize such Operational Intelligence on their own. These adjustments include:

  • Vastly improving Hadoop Distributed File System (HDFS): Previously, Hadoop was circumscribed in the number of files it could store, which maxed out around 50 to 100 million files. By creating a distributed data platform that greatly reduces dependency on NameNode and Java, and supporting a HDFS API, Hadoop has furnished an enterprise class storage processing layer that can accommodate a trillion files and support rewrites.
  • Expediting Hadoop’s Processing Speed: Improvements made to Hadoop’s underlying engine are responsible for up to 10 times greater production speed for certain applications—which can also result in deploying less hardware for customers.
  • Snapshots: Snapshots provide backup capabilities without copying data by taking “pictures” or snapshots of data at various time frames, which users can go back and access as needed. The architectural and pecuniary strains of constantly copying petabytes of data (per day) on existing storage infrastructure is considerable, and eschewed by snapshots which monitor minute data changes and enable users to rollback to their data prior to updates. Snapshots are facilitated by storing the amount of data that is the difference in those changes/updates.
  • Mirroring: Mirroring builds on the capability of snapshots by allowing users to replicate the differences in the data states over multiple locations, such as between data centers. By properly sequencing the different versions of the data between locations, organizations can have what ultimately amounts to a backup copy of data in the case of failure.
  • Data Recovery: Snapshots and mirroring also enable data recovery options that were not previously available.

“This Internet of Things is really in our wheelhouse because we’ve completely rewritten our underlying file system so you can continuously stream data,” Norris said. “You don’t have a batch construct. We’ve integrated a Hadoop database so you can combine deep predictive analytics with real-time responses. We have customers today who have the best of both worlds and who can handle large scale distributed data sets, do some really interesting things across that data, but also have the real-time database operations so you can respond to make those adjustments.”

Use Cases

The use cases for the IoT are almost as varied as the use cases for both Hadoop and Big Data themselves. However, whereas previously these use cases would more than likely need to be facilitated by a third party vendor, the aforementioned improvements in Hadoop have made it possible for enterprises to utilize those capabilities on their own. The many industries to benefit from this technology include:

  • Computing: HP is a featured Hadoop/MapR customer, which has utilized its platform to monitor product quality through the use of telemetry data. The company is able to ingest greater amounts of data and access it faster, so that it can identify trends in product behavior and proactively provision maintenance and troubleshooting—which reduces down time and increases product quality.
  • Agriculture: Contemporary farming equipment is packaged with sensors that provide conventional Operational Intelligence that enable organizations to schedule maintenance and to diagnose the health of these assets. However, these sensors can also detect external factors that may affect machine performance such as weather and soil conditions, and assist farmers in determining how best to utilize them for the conditions in which they are laboring.
  • Finance: Credit card companies can utilize sensors in smart phones to localize and personalize offerings (based on recommender engines) for consumers pertaining to their locality, previous behavior, and credit standings.
  • Data Analysis: Although data analysis is not a vertical industry, the snapshot function is of particular value for data analysts who need to query data and have difficulty doing so with static data due to the fact that Big Data is constantly streaming in every second. Using snapshots, they can easily go back to a specific point in time that is still relevant to the present and issue queries on static data.

“In seven of the top 10 verticals, we’ve got at least one million dollar customer in each those verticals,” Norris said. “We’ve got 500 paying customers across industries; the average customer—about 80 percent of our customers—expands their cluster within the first year. So there’s a lot of customers getting quick benefits and expanding their use of Hadoop.”

And there are still more customers who are accessing Operational Intelligence benefits through a third-party service provider, which in turn is utilizing Hadoop to capitalize on the IoT.

Ready for the IoT?

Big Data is only expanding with the IoT. In the future, the IoT will come to transgress the industrial applications of Operational Intelligence and encompass routine aspects of business and personal life. Hadoop’s recent facelift ensures that it will continue to be one of the most sought after platforms to manage Big Data since its modifications have readied it for the production of Big Data applications.

“From a logic sense, the Internet of Things is becoming mainstream as we speak,” MapR chief data engineer Michael Hausenblas said. “There are certain areas that are visible; we’ve talked about some examples in agriculture and retail. There are other areas like smart buildings and smart cities, whether in commercial settings or at home, in which the capabilities are there but critically deploying all of these sensors takes time. I think there’s another year or two before we see mainstream adoption. But in general, if someone were to ask me is the IoT real, is there something there, I would say absolutely.”


Leave a Reply