Data Discovery tools have revolutionized Business Intelligence (BI) and are the vital link in the facilitation of self-service BI, which is part of the larger movement towards the simplification of Business Intelligence. More than any other technology, these tools empower the business user to take advantage of data as it is needed, effectively reducing time to insight.
Whereas traditional BI vendors once downplayed the significance of these tools – which were even considered distinct from those for Business Intelligence – Data Discovery’s coup became official when, according to a report from Gartner, “By the end of 2012, all of the traditional BI vendors rated as leaders in ‘Magic Quadrant for Business Intelligence Platforms’ had added some form of data discovery product.”
Not only have traditional BI vendors added discovery software, but a number of Data Discovery vendors (most notably QlikView and Tableau) have also added facets of traditional BI (such as Metadata Management), further blurring the line between what has come to be regarded as a single technology.
The result is that the business user is now the principle driver for (and purchaser of) BI products, which typically include at least one of the following elements of Data Discovery: search, dashboards, interactive visualization, in-memory analytics, and data mashups. A closer look at these individual components and their collective effect illustrates their impact on business users and the trend towards simplified BI.
Search-Based Data Discovery Tools
In an April 2012 article, Forrester revealed that “only 1% to 5% of all enterprise data is in a structured, modeled format that fits neatly into enterprise data warehouses…and data marts.” One of the principle advantages of search-based discovery tools is that they enable users to analyze structured and unstructured data via search terms through the use of text. Although most visually based discovery tools are designed for quantitative data, search tools apply to both quantitative and qualitative data.
Search software also reduces the reliance on traditional BI Metadata by modeling and storing data based on its own proprietary structure. As with most Data Discovery technologies, search incorporates an interface specifically designed for non-technical users and a performance layer that utilizes indexing or Random Access Memory (RAM) to minimize calculations relating to aggregates. By combining search with visualization tools, users can explore quantitative and qualitative data via text or through visualizations. A good example of search software is Vivisimo Velocity Platform, which has been incorporated into IBM’s Infosphere Data Explorer for its search-based discovery capabilities.
Interactive Data Visualization
Interactive Data Visualization is arguably one of the most vital Data Discovery applications. This software enables users to look at visual representations of data in forms far beyond that of traditional spreadsheets, reports, and tables. By managing and manipulating various forms of brightness, images, and colors, users can readily discern patterns within data in close to real time without causing – and waiting for – an IT backlog. Some of the more valuable technologies and manifestations of Data Visualization include:
- Dashboards: Technically, dashboards are considered distinct from Data Visualization tools for the simple fact that they achieve two different ends – the former are used to display a variety of sources and forms of data, while the latter is principally designed for exploration and discovery of trends. Yet many (but not all) dashboards contain visualizations, while some visualization tools can create dashboards. Dashboards display data in easily discernible ways, and are ideal for monitoring data in close to real time.
- Geospatial technologies: Geospatial technologies in visualization tools coincide with the emergence of Mobile BI and provide mapping information that can augment a variety of data sources of interest to business users – particularly those in sales. These technologies are vital for mobile users who operate in the field.
- Additional features: There are few restraints for the amount and variation of manipulation users can actuate with visualization tools. There are a variety of options for filtering (including the use of histograms which offer visual suggestions for how filters will affect data), indicator reports in which KPIs are represented by icons for trends related to conditions and progress, collaborative capabilities in which representations can be annotated and shared, and many more. Some Data Visualizations, such as SAS Visual Analytics, offer auto charts, a system produced response for the most appropriate form of visualization based on selected data.
Data Mashup Capabilities
Data Discovery tools with Data Mashup capabilities can facilitate ad-hoc queries among disparate sources expediently. Data mashups enable users to define their own terms and to query only the data they want to, providing answers to specific data-related questions on the fly. Mashups also facilitate the incorporation of different sources into varying applications throughout the enterprise. The technology for data mashups is based on mashup technology for virtually combining different sources on the internet. Data mashups augment visualization and search tools by enabling users to issue specific queries based on information gleaned from the aforementioned discovery tools.
Mashups enable users to manipulate data to fit their own specific needs for an inquiry while combining a variety of sources through in-memory processes which enable expedient data delivery. Like all Data Discovery tools, data mashups enable those with nontechnical skills to perform complex operations in a simplified manner. IBM’s Cognos Mashup Services embeds its Cognos Business Intelligence into applications and processes throughout the entire organization.
In-Memory Analytics
In-Memory Analytics is a contender with interactive Data Visualization for the most valuable component in Data Discovery tools. It takes advantage of in-memory computing wherein memory is effectively cached and stored in RAM or Dynamic Random Access Memory (DRAM) instead of a computer’s hard drive or disk. A recent report from Gartner states:
“In-memory computing (IMC) is an emerging paradigm that enables user organizations to develop applications that run queries on very large datasets or perform complex transactions at least one order of magnitude faster – and in a more scalable way – than when using conventional architectures.”
In-Memory Analytics is one such application, as evinced by its almost real-time querying speed. This technology takes advantage of the fact that with prices in RAM declining and advancements in computing – most notably the leap from 32 to 64-bit computing – surging in memory and speed, it is possible to store Big Data quantities of data in-memory. Consequentially, In-Memory Analytics all but makes obsolete traditional BI needs for indexing and storing consolidated data in tables or OLAP cubes. IT costs and other resources are reduced accordingly, which helps to foster an environment for exploration and innovation that Data Discovery is known for. In-memory processes also assist in preserving data availability and integrity. Products like Oracle Exalytics In-Memory Machine provide In-Memory Analytics independently or with Oracle’s discovery/BI platform.
In Retrospect
Data Discovery applications are all but indistinguishable from those of BI. Their mainstream adoption heralds a transformation of Business Intelligence from historical reporting to predictive analytics. The proliferation of these tools and their impact on Business Intelligence makes it somewhat impractical to find a single Data Discovery/BI vendor to fit an enterprise’s entire analytics needs; instead, many organizations have opted to use multiple products collaboratively.
More importantly, these tools have firmly positioned the business user in control of analytics, making self-service BI a verifiable reality. This increased control over definitions, data sources, and queries may elicit Data Governance and Big Data Governance concerns. Once effective governance mechanisms are in place, however, Data Discovery tools should only increase a reliance on (and the effectiveness) of data.