Streaming Analytics 101: The What, Why, and How

By on

Stream processing analyzes and performs actions on real-time data though the use of continuous queries. Streaming Analytics connects to external data sources, enabling applications to integrate certain data into the application flow, or to update an external database with processed information. Bloor Research analyst Philip Howard says stream processing is really an evolution of Complex Event Processing (CEP). Both CEP and streaming analytics technologies enable action based on an analysis of a series of events that have just happened.

Essential to stream processing is Streaming Analytics. Streaming Analytics is the ability to constantly calculate statistical analytics while moving within the stream of data. Streaming Analytics allows management, monitoring, and real-time analytics of live streaming data.

Streaming Analytics involves knowing and acting upon events happening in your business at any given moment. Since Streaming Analytics occurs immediately, companies must act on the analytics data quickly within a small window of opportunity before the data loses its value. The data can originate from the Internet of Things (IoT), mobile phones, and mobile devices such as iPads, market data, sensors, Web clickstream, and transactions. Data that loses its value results in additional costs such as: operational, administrative, business risks, reputation damage, potential legal action, reduction in productivity, inability to make informed decisions, and reduces a company’s competitive edge.

Streaming Analytics taps into streams of GPS data from cars, continuously aggregates that data, and merges it in real time with the location information of customers. Each move a taxi driver makes and each move a car makes, Streaming Analytics calculates which cars are closest to a specific taxi driver based on any selection criteria chosen.

Advantages of Streaming Analytics

  • Provides Deeper Insight through Data Visualization: Visualization of vital company information can help companies manage their key performance indicators (KPIs) on a daily basis. KPI data is viewed in real time, which produces a single source of truth of real-time data that can provide a helicopter and granular view of a company at any given time. The data can improve sales, reduce costs, identify errors, and provide information to react faster to risks to mitigate them. Streaming Analytics accelerates decision-making and provides access to business metrics and reporting.
  • Offers Insight into Customer Behavior: Streaming Analytics allows companies to gain visibility into what customers are buying, not buying, customer preferences, and dislikes. This gives companies the ability to generate additional profit and retain existing customers. It allows companies to rapidly respond to customer needs and increase revenues through up-selling and cross-selling of goods and services.
  • Remain Competitive: Businesses can identify trends and benchmarks, develop white papers, use cases, and generate forecasts of their company and industry. This reduces internal and external threats and provides awareness of industry changes. This helps companies become innovative, remain competitive, and strengthen their brand.

Disadvantages of Streaming Analytics

  • Lack of Experts: An issue with Streaming Analytics is the lack of availability of experts in the field. There are only a small number of Data Scientists and a smaller number of companies that hire them. Streaming Analytics is still a recent technology and adoption is slow by most developers due to their lack of expertise. .
  • Perform Risk Analysis: Streaming Analytics allows companies to view and analyze the latest media and industry news to keep abreast on the latest development in their industry. It also provides companies with data on customers and vendors allowing companies to take action when a risk or specific event occurs.
  • Securing Data: Streaming Analytics allow companies to analyze internal and external threats that affect the company or industry. Companies can identify sensitive information that is not protected or that is not adequately protected and ensure state, federal and regulatory requirements are met.

A Range of Product Offerings

The big enterprise players in Streaming Analytics include Striim, SAP, TIBCO, IBM, Software AG, Oracle, as well as open source players such as Apache Spark, Apache Storm, and Hadoop. We provide a comparison of seven Streaming Analytics products.

  1. Striim: Striim specializes in streaming and real-time analytics. Striim regards themselves as the only end-to-end, real-time data integration and intelligence solution that enables multi-stream data integration and real-time CDC across a variety of data sources such as: databases, log files, events, message queues, and IoT sensor data.
  2. IBM: InfoSphere Streams provides high performance and scalability, and is second in functionality to Software AG’s Apama for analyzing and scoring data in real time. InfoSphere offers a highly scalable event server and integration capabilities for implementing stream processing use cases. It offers easy-to-read code and easy-to-use modeling tools. It can be used in small implementations on a single laptop or multi-node implementations that scale to millions of transactions per second.
  3. TIBCO: StreamBase offers a comprehensive platform for processing and acting on live and historical data. It has intuitive interfaces that combine graphical and SQL-like queries. It allows non-developers to create queries based on incoming streams. It provides the ability to rapidly build applications that are easily deployable and that analyze and act on real-time streaming data.
  4. SQLstream: Blaze enables companies to be data-driven in real time. Real-time data can be discovered, analyzed, and aggregated instantly, and delivered as a continuous ingest into Hadoop, data warehouses, and other enterprise systems. It is easy to use, does not require coding or Data Scientists and is an appealing option for companies where expensive assets are at risk.
  5. Apache Spark: Spark is an in-memory distributed data analysis platform for large-scale data processing and batch analysis jobs that supports different programming languages such as MapReduce, in-memory processing, and stream processing. Spark makes it easy to build scalable, fault tolerant streaming applications. Spark combines streams against historical data, offers the ability to reuse the same code for batch processing, or run ad-hoc queries on stream state. Spark is said to be 40 times faster than Storm.
  6. Apache Storm: Storm is a free and open source real-time distributed processing platform developed by Twitter. Storm focuses on stream processing or CEP. Storm is ideal for real-time data processing because: it is scalable, guarantees no data loss, is robust, fault-tolerant, and topologies and processing components can be defined in any language. It operates on data-in-motion or a continual stream of data. Storm requires writing Java code, does not include SSL or database integration, and cannot be used with Active Server Pages (ASP).
  7. Hadoop: Hadoop is a high-throughput system that can process large volumes of data using a distributed parallel processing paradigm called MapReduce. However, Hadoop was never built for real-time processing. It solves problems where you want to run deep and extensive analytics on a mixture of complex and structured data that does not fit easily into tables. It is great for calculations of Big Data volumes and can run on machines that do not share memory or disks. However, it does not work well for real-time analytics.

Why Is Real-time Analytics Important?

Streaming Analytics allows companies to analyze data as soon as it becomes available allowing the ability to analysis risks before they occur. Streaming Analytics can help companies identify new business opportunities and revenue streams which results in an increase in profits, new customers, and improved customer service. A Streaming Analytics platform can process millions and tens of millions of events per second. “Because data in a Streaming Analytics environment is processed before it lands in a database, the technology supports much faster decision making than possible with traditional data analytics technologies,” Philip Howard of Bloor Research said in a recent Datamation interview. Streaming Analytics helps provide security protection because it gives companies a fast way to rapidly connect different events to detect security threat patterns and their risks, and to perform security monitoring of network and physical assets.

Leave a Reply