In many of my recent posts, I’ve discussed the growing need for specialized Big Data handlers and innovative analytics technologies to help businesses take advantage of three of the “V’s” of Big Data: increasing volume, variety, and velocity. But in my latest blog, I briefly highlighted what I believe to be the fourth “V” of Big Data: veracity. Veracity in the context of Big Data connotes not only to the accuracy of the data but also the reliability of the source, the context, engagement, and interaction that triggered the data to be generated, the methods, transforms, analytics used in the information extraction, and actual information derived from it. Each one of these elements is an important characteristic that is often overlooked.
With nearly 15 petabytes of data being created every day, organizations are on the hunt for ways to gather, organize, store, and analyze this massive amount of data. As we all know, the ladder of business intelligence traverses from raw data to curated data to information to intelligence to prediction. This traversal presents an enormous opportunity for new insights into customers and markets, and those insights can give organizations a meaningful competitive edge.
But there’s a catch: many business leaders do not know much (think: accuracy) about the data that’s available to support their decisions – even when it originates within the organization, and more so when it is acquired from external sources. These leaders are also increasingly asked to make critical decisions based on partial information. The cost of delaying the decision until complete information is gathered may be too great in terms of lost opportunity. This is where Big Data analytics provides a way to analyze sparse data from a wide range of data sources and arrive at best possible outcomes at that point in time.
However, as more sparse information gets shared across lines of business, the risk of ingesting old or inaccurate data increases, leading to potentially biased or false conclusions. Since the information is processed from sparse data, there is a continual need to adjust the information and corresponding conclusions and decisions as new data comes in. Tracking lineage of the data, the development of information from that data and the corresponding conclusions are very important for such a continuous update. Beyond this, having data stored in various systems, all with different governance rules can inadvertently allow inappropriate individuals to access and view sensitive data.
To mitigate these risks, business leaders must now be considerate of more than just quick analysis, efficient storage solutions, and savvy technicians. In today’s era of Big Data, managers will need to ask themselves: how confident am I in my data?
A new approach
As businesses begin to evaluate their understanding of Big Data and work towards a solid basis for confidence in the decisions they make, questions such as “What is my data source?” “What is its history?” “Can it be trusted?” and “Where is my data going and how do I protect it?” will be raised. To answer these questions and make sense of the onslaught of Big Data, organizations need a way to view data as an entire landscape of related information, not just an endless array of unrelated points. An integrated approach makes a rational view possible, and it prepares information for a whole range of subsequent uses – for analysis, for consolidation or for the creation of a single view.
As Big Data drives transformational change in the way data is managed, that change also requires a solid foundation in technology. When trying to establish an integrated data system, businesses should consider new technology solutions that allow them to do the following:
- Rapidly and automatically assess data value
- Rapidly and automatically identify and protect sensitive data
- Put information to use quickly and easily, for purposes appropriate to its value
- Govern information appropriately and automatically, according to its value and intended use
Since Big Data projects often involve non-traditional sources such as machine data or social data, which may be less formally managed or controlled, dashboards are a great way for businesses to obtain an all-in-one view of their data. These solutions can provide immediate insight into policy statuses for data sources, delivered through a dashboard that lets users visualize the level of trust in a data source. They can also help users identify and stop bad and inaccurate data at the source, avoiding the potential for that data to be curated into the information extraction pipeline, leading to decision making based on inaccurate or irrelevant data.
Business and IT leaders would also do well to consider solutions that allow them to monitor and mask sensitive data in their systems against unauthorized access and spot suspicious activity for immediate response. Through this, businesses can be confident data is secure, private, and protected against tampering. Such masking needs to be symmetric, such that the masking and encryption do not prevent us from using the masked data usefully in analytical operations.
Ensuring these considerations are met will enable organizations to successfully implement an integrated data management strategy for mapping information value to intended use, evaluating data accuracy, and ensuring data protection. In turn, this will help businesses reach a heightened level of confidence in their data, allowing them to make appropriate, insightful decisions and enabling them to take the necessary next steps for growth.