Case Study: Data Virtualization Seasons the Machine Learning and Blockchain Landscape for McCormick

Food flavoring giant McCormick is spicing things by adopting Data Virtualization for a number of its key projects including the deployment of Machine Learning techniques and a Blockchain collaboration program with a prominent retailer as part of a plan to better control food quality.

In a recent DATAVERSITY® interview with Terry Moon, an Information Architect at McCormick, and Denodo’s CMO Ravi Shankar, Moon commented: “What you quickly learn with a Machine Learning project is that you really don’t know what data you’re going to end up using and what data you’re going to have to bring in” for algorithm training. A year into the feasibility stage of the project, McCormick had to freeze the data set because it was too costly and time consuming to keep it current. Moon set about considering whether there was a Data Architecture solution that the company could use to easily connect an extensive number of data sources together to address this challenge.

APIs seemed like a good answer at first, but there’s still a lot of coding involved in creating them, she noted. Moon then pondered over the question of whether there might be a platform where she wouldn’t have to worry about “all the mechanics of connecting to all these systems or of having to develop APIs and that could also help with the performance of hitting databases.” Data Virtualization could be the answer, and among the vendors out there, Moon thought Denodo was just the right one based on a series of POCs with other products.

McCormick is now leveraging the latest version of Denodo’s Data Virtualization platform. The newest version provides native support for massively parallel processing (MPP) so organizations such as McCormick can handle huge amounts of data at high speed, as well as support for self-service scenarios via a dynamic Data Catalog.

The latter is integrated with real-time data delivery so that business users can discover, explore, prepare, and access data without IT involvement. Existing functions already include a Big Data fabric abstraction and federation layer that hides the complexities of data stores and makes it easy to integrate data from them with other data within the enterprise. Another existing feature is its Dynamic Query Optimizer that determines the best query execution plan to deliver optimum performance, considering data source statistics and indexes, as well as the special characteristics of big data sources.

Also looking to bolster its performance gains, the platform’s features caught the interest of Moon right away. The multi-billion company stores much of its information inside SAP – both in on-premise and Cloud databases – but not all of it. For the new Machine Learning project, a lot of “data massaging” was involved as part of its data collection exercises, including masking some sensitive data for an external partner it was working with.

According to Shankar, the Denodo Platform:

“Integrates all types of data including structured, semi-structured, and unstructured sources, irrespective of the data format or location of the data. Integrated data sources from relational databases to streaming data to web services are exposed as restful APIs and using self-service capabilities.”

And with MPP, Denodo can “go up against very large data sources and combine huge volumes of data off of MPP,” said Shankar.

More Projects in Store

Moon saw that Denodo would also be relevant for other use cases beyond the Machine Learning project:

“Part of what I have to deliver to the organization is how to leverage the platform to make use of the information in general in other ways,” she said. “You have to eventually position yourself where you can get self-service access to the information and that information has to be trusted and modeled and delivered in such a way that people understand it.”

By providing a middle layer for Data Virtualization – in between enterprise systems like SAP, Data Warehouses and consumers on top – Denodo stores the Metadata and understands the data, knowing its origin, its lineage, its type, and how it is all combined, Shankar noted.

“We expanded this to include information to have business definitions about the data attached to the metadata component. So now, business users can actually come to the Data Virtualization layer to search through the data, discover it, understand data associations and relationships, and learn who owns the data to build some governance processes around that.”

That’s in contrast to the way things had been done. For instance, said Moon, one set of users in the company had to constantly spend a lot of time moving in and out of systems, maybe as many as ten of them on a daily basis. With self-service delivery of the data, “they can go to one point – one place to get access to that information, which is hugely valuable for them.” They are now able to search in that one place and get all the related information they need for Self-Service Analytics and ad-hoc review with that one search.

Moon’s plans also involve using Denodo so individuals such as Data Scientists or Business Analysts can work together via a managed layer.

“If I deliver a model as a business language layer, I can now deliver that model to an analyst or a scientist, so they can in turn model on top of it in the tool,” she said. “It gives them the flexibility to use information from the enterprise and they are able to be more productive in creating analysis.”

Getting Everything Moving

One important concept is that consumers of the data are typically disintermediated from data producers, Shankar added. The advantage is that as a consumer you are running your own reports, but underneath there is the flexibility for Enterprise Architects like Moon to change the systems as needed, as in the case of adopting new types of data or becoming Cloud ready. This fits into the abstraction concept, by which “IT can presume modernization of the infrastructure without affecting the business processes and the business users,” he said.

Moon also advises that companies should take their time with Data Virtualization implementations, addressing issues like what policies should be in place for connecting to one set of source systems rather than another. She also recommends considering policies to positively impact how information assets are secured,

“Because now you’re moving some of that security from a consumption standpoint into one systems. And then there are the policies around how people now get access to assets. There’s a lot of planning and decisions to be made around that,” said Moon.

Photo Credit: Subbotina Anna/Shutterstock.com

LEARN HOW TO IMPLEMENT MACHINE LEARNING IN YOUR ORGANIZATION

Data Topics

Leave a Reply Cancel reply