The explosion of data projected to emerge in the wake of the burgeoning Internet of Things (IoT)—and which is a daily occurrence in the Industrial Internet—requires specific needs for analytics that are unique to this facet of Data Management.
In order to effectively glean insight into the continuous streams of real-time event data, the enterprise must account for:
- Speed: The constant influx of data at sub-second speeds requires an analytics platform able to process these celeritous demands in an equally low or no latency fashion.
- Resource Allocation: The massive quantities of data generated by the IoT can quickly drain even the largest quantities of servers and produce a considerable strain on conventional architectural concerns.
- Loading: Ultimately, an IoT analytics platform must be able to load data into a repository as rapidly as they come in, which will then enable users to aggregate incoming, tactical data with historic, strategic data.
Additional concerns include solutions to account for data streaming, visualizations, alerts, and an assortment of analytics options to best derive actionable meaning from data. As such, there may be considerable difficulty for organizations attempting to stitch together these tools with their existing ones to account for the way the IoT will change the data sphere.
They can choose to take advantage of what is touted as the only analytics database and ensuing platform that was designed specifically to handle the aforementioned concerns wrought in the wake of the IoT, and which can result in a degree of monetization of this phenomenon that exceeds that of the former piecemeal approach.
Cupertino based ParStream recently announced a database that processes data in milliseconds and which directly addresses the IoT’s unique concerns related to:
- Speed: By leveraging standard Intel-based hardware with virtually unlimited scalability, the database utilizes massive parallel processing (MPP) techniques to split queries among a multitude of cores. Thus, regardless of table size (one of ParStream’s customers has two billions rows and thousands of columns) the database is still able to achieve sub-second response time.
- Resource Allocation: There are two facets of ParStream’s platform which allow the enterprise to drastically reduce its resources for IoT analytics. The first is its scalability, which was responsible for one customer’s reduction of 150 servers to just four while utilizing ParStream’s product. Bandwidth and architectural problems caused by an influx of data from various locations into a single data center are alleviated by the geo-location capabilities of the vendor’s Geographical Distributed Server.
- Loading: Best of all, ParStream’s database enables a multitude of users to query data as it comes into the database, which all but eliminates any sort of downtime associated with analytics on real-time event data.
“The space that we operate in is where you have a lot of data and you want to be able to have queries responded in less than a second and at the same time you want to load data in as you query,” ParStream Chief Marketing Officer Syed Hoda remarked. “If you have 100 users querying the database you need to load the data as it comes in. That’s where we’re strong. In that space we’re quite unique. There aren’t very many that do that.”
Although the core of ParStream’s IoT analytics platform is unequivocally its proprietary database, it has established partnerships with both Informatica and Datawatch to address two crucial elements of the analytics process—streaming data/ETL and visualization. ParStream’s database has an API that will integrate with tools offered by additional vendors pertaining to these functions. However, it was specifically designed to seamlessly integrate with the offerings of the previously noted vendors.
There is a pivotal relationship that exists between data visualization and streaming data. By streaming such data directly into a visualization tool, an end user is able to effectively see the effects of data in real time and act on it accordingly. Alternatively, it is possible to store data within a database and aggregate it with one’s historic data and utilize visualization tools to see the comprehensive effect of the streaming data and traditional strategic data. With ParStream’s technologies, users can leverage either of these two options with minimal latency—the sub-second response time for streaming data may involve a second or two when it is stored in a database. Hoda commented that:
“What we’ve done is we started developing this database from scratch in C. We didn’t base it on someone else’s framework or open source or what have you. We said we want to create the most efficient, fastest database specifically for analytics. So we don’t do transactions; we had to make some tradeoffs. But because we don’t do all these other things, we have a lot less architecture that allows us to run this blazingly fast. So, if analytics and Big Data is what you want to do, we do that well.”
The real-time querying capabilities of ParStream’s database are augmented by a variety of options for analytics. The database supports both R and Knime for Machine Learning, predictive and prescriptive analytics. Additionally, the analytics and querying functionality of the database is assisted by dedicated measures for time series, which enables the user to distinguish data within the database by specific temporal factors (such as that which arrived at a certain minute, second or day). By adding indices specifically to account for time series analysis, ParStream was also able to accelerate its querying and analytics processes. The database can also generate user-defined alerts that prompt action based on certain events or insight provided by data.
One of the aspects about ParStream’s platform that makes it well suited for the IoT is its Geographic Distributed Server, which is designed to analyze data at its source regardless of its physical location. Whereas conventional options for analyzing distributed data involve funneling them from disparate sources into a central database, ParStream takes a different approach. Utilizing the platform’s ParStream Geo Distributed Server as a central database, queries are split and directly routed to the most relevant servers that are installed at the (physical location of) the data’s source. Consequently, analytics are able to be performed onsite, and the results are then routed back to the database and to the germane application. This geographically distributed approach also contributes to the expedience at which the platform can issue queries and perform analytics.
More importantly, this process introduced by ParStream reduces the amount of data that is moved into the database (which can be a substantial load). Without using ParStream’s geographically distributed method, a global telecom company with cell phone towers throughout the U.S. was responsible for the daily moving of four billion records into its database. Utilizing the aforementioned methodology, however, it simply moves the results of those queries into their database and applications, which is approximately 40 records and greatly reduces the burden on its network and its architectural concerns.
“The main reason people choose us is our speed compared to others,” Hoda said. “We are super fast and we don’t require many resources. That’s what people like. They buy us because they can put a lot of data in it, get a lot of data out very fast and import data. That’s the main points.”
IoT vs. Industrial Internet
At present, the majority of the deployment for analytics in the IoT involves the Industrial Internet, which largely revolves around the real-time monitoring of an assortment of machines that generate sensor or event data, such as oil and natural gas equipment or aerospace equipment. The Industrial Internet is a healthy component of the IoT, which encompasses not only large equipment assets but the personalized application of home appliances, mobile devices, transportation conveniences, Smart Cities, and more. ParStream was expected to partake in the Internet of Things World Forum in Chicago in mid-October, which is another testament to the timeliness and relevance of its platform in the wake of this emerging Data Management phenomenon.