Loading...
You are here:  Home  >  Data Education  >  BI / Data Science News, Articles, & Education  >  BI / Data Science Blogs  >  Current Article

To Get Value from Data, Data Discovery Must Come First!

By   /  October 6, 2017  /  No Comments

Click to learn more about author Stuart Tarmy.

Confidence in your Data Analytics is only as good as the completeness of your data. Most companies struggle with accessing and understanding their data, especially in older, legacy or siloed systems.

Consider the following.  Your CEO has long recognized that Data Analytics needs to play a greater role in the company, but recently it’s taken on critical urgency.  Competitors have been accelerating their Data Analytics capabilities, and new data-savvy entrants like Amazon could potentially enter the industry.  If your company doesn’t more fully embrace Data Analytics, it could lose market share or, worse, become irrelevant in its industry.

So, your CEO assembles her best team, made up of the company’s most senior Data Analytics experts, supplemented with top-tier external consultants, who enthusiastically attack the problem.  After several months, they report back on a number of analytics-based recommendations to improve the business, ranging from increased customer acquisition and retention, faster product introductions, improved pricing schedules and loyalty programs, higher customer satisfaction programs, more responsive supply chain processes, and more efficient customer invoicing and collections.

After implementing these recommendations and monitoring the results for several months, it becomes clear that the expected improvements have not materialized, and in some areas, have actually deteriorated. The CEO is upset and frustrated, because she had been promised that Data Analytics would make the business more efficient, competitive and effective. What went wrong, and how could this have been prevented?

Your Data Scientists Know How to do Data Analytics, So Why Can’t You Trust the Results?

Your Data Analytics can only be as good as the completeness of your data.  While the analytical work itself is not trivial, a good data science team will know how to conduct professional quality Data Analytics, using popular approaches and technologies including Machine Learning, random trees, neural nets, or discrete, nonlinear regression, among others. They also will likely be proficient in the popular programming languages like Python, R, Java and C++.  In addition, they will have knowledge and access to Data Analytics, Machine Learning, and Aavailable from leading companies like IBM, Microsoft, Google, Oracle, SAP, SAS, Amazon and Salesforce.

With all this knowledge and professional expertise, how could the Data Analytics results be suspect?

Your Data Analytics is Only as Good as Your Data Completeness

The answer is that Data Analytics professionals often work with incomplete data. While their work may be top-notch, in a technical sense, the analysis isn’t comprehensive, and reliable insights are hard to come by.  Indeed, when pressed, data professionals will tell you their most difficult problem is not the data analysis itself, but getting access to – or even having full awareness of – the data they need to do the analysis well.

This isn’t an edge case, and it’s not a minor factor.  In fact, our research has found that companies can lack awareness of up to 80 percent of their data assets, and thereby lack visibility into key data elements and data flows in their own systems.  Even companies doing such analysis will themselves report that they lack deep knowledge of around 20-50 percent of their data, typically citing storage of data in older legacy or siloed systems as the reason. Another cause cited is the implementation of purchased software solutions with functionality that is well-leveraged, but with data architectures that are black-boxed, leaving the data siloed and excluded from analyses.

Confident Data Analytics Begins with Complete Data Understanding

For example, consider a company that has many older, legacy systems and has supplemented its organic growth over the years through a number of acquisitions.  In the case of the legacy systems, the original developers may have departed the company long ago, leaving behind little documentation; numerous patches to the systems over the years may have only made matters worse.  The acquired companies, meanwhile, brought in their own systems, and while the parent company may have integrated them into its existing environment, it never had a comprehensive ‘hand-off,’ and therefore lacks an understanding of the underlying data assets.  Consequently, the company may truly understand only a fraction of its data elements and data relationships.

Another class of scenario involves a company that operates in different, siloed business areas that it wants to understand holistically.  For example, imagine a financial services firm with retail bank, full service brokerage, high net worth bank, corporate bank and credit card divisions.  The company would like to have a 360-degree view of its customers, but while each business unit has a good understanding of its customer data, the company lacks a way to integrate individual customers’ data across each of the business silos.  If the company could transcend this challenge, it would have valuable insights into customer purchase history, cross-selling opportunities, customer service engagement, and problem resolution.

In both cases, confidence in data analysts’ recommendations may be lacking, since only a portion of available data is being harnessed.

Manual Data Discovery is Hard

The company with the legacy systems (including those brought in-house through corporate acquisitions) need to have insight into the database structures of its underlying legacy and packaged systems. It also needs to understand how data in those systems relates to entities in its own databases.  The financial services institution may understand each individual system’s structure and architecture, but it still needs a way to connect entities between these systems that may represent the same real-life customer.

Relatively few companies have done such database mapping and discovery work, in depth. In the past, such work required a team of data specialists to go through databases manually, and attempt to discover unknown data elements and relationships between the data, an arduous task to be sure.

has found that a database expert can typically analyze a limited number of databases per month, typically between 2-5 databases per month.  For an enterprise with hundreds or possibly thousands of databases (which likely continue to grow in number), this can be an extremely tedious, costly and error-prone process.  Even if a company can find a large team of database experts that understands its systems, getting all this painstaking work done can be futile, leading companies to opt out of the effort.

Smart Data Discovery is the Solution

Recent breakthroughs in technology allow a second, more efficient approach: the use of a smart data discovery solution to detect and identify unknown data elements and data relationships, at scale, on an automated basis.  This method uses machine learning to ingest and automatically analyze company data.

This automated approach, while relatively new, offers a substantially faster, more accurate and cost-effective solution to data discovery. In fact, based on work with different companies, we have found it feasible to perform automated data discovery, at scale, with up to 80 percent in cost reduction as compared to manual data discovery.

Organizations need analysis of all their data in order to compete better in their marketplaces and make digital transformation successful.  With today’s data volumes and their growth rates, automated relationship discovery is a must.  In the past, many companies have feared that unifying their data assets would prove futile. New technology makes this feasible and achievable.

Using automated Machine Learning analysis to facilitate comprehensive exploratory analytics is just common sense.  And now that it’s here, it bodes well for data governance, data-driven decision making and digital transformation overall.

About the author

Stuart Tarmy, Vice President, Io-Tahoe As a Vice President, Stuart leads Business Development and Sales for Io-Tahoe LLC. He has over 20 years’ of experience as a General Manager and head of sales, marketing and product management for leading global financial service technology, ecommerce, machine learning, data management and predictive analytics (Big Data) companies. He has held senior executive roles with Fiserv, Albridge Solutions (acquired by Pershing/BNY Mellon), MasterCard, and McKinsey & Company. Stuart began his career as a design engineer at Texas Instruments developing machine-learning based computing systems. Stuart holds an MBA from the Yale School of Management, a MS in Electrical Engineering from Duke University, and a Sc.B. with Honors in Electrical Engineering from Brown University. Follow Stuart and Io-Tahoe at: Twitter

You might also like...

Taxonomy vs Ontology: Machine Learning Breakthroughs

Read More →