Click to learn more about author Dr. Jans Aasman.
Data fabrics are emerging as the most effective means of integrating data throughout the enterprise. They deliver a single access point for all data regardless of location — whether it’s at rest or in motion. Experts agree that data fabrics are the future of data analytics and management. Gartner recommends:
“Data and analytics leaders must upgrade to a data fabric design that enables dynamic and augmented data integration in support of their Data Management strategy.”
Forrester states that “Enterprise Architecture (EA) pros should use data fabric to democratize data across the enterprise for various use cases.”
However, the adoption rate of data fabrics hinges on the ROI of their use cases. One such use case is to make it easier to do advanced Data Science on available data sources. Currently, extracting machine learning features is an exacting, time-consuming process because relevant data is trapped in silos. Data fabrics and knowledge graphs have a unique, symbiotic relationship because they substantially streamline the processes to extract data from the myriad sources that populate these platforms. Knowledge graphs are key to providing fundamental capabilities enabling data fabrics to accomplish this objective.
Experts in the world of enterprise IT are quickly starting to adopt the notion that data fabrics are the most mature means of harmonizing and integrating data. When fueled by knowledge graph technology, these fabrics create the optimal means of aligning data of all types for any singular business purpose.
Within the knowledge graph paradigm, an emerging best practice is to use entity event modeling (EEM) to accelerate — and augment the results of — artificial intelligence models. The first step of this approach is to create business glossaries of important business concepts so data is easily mapped into a knowledge graph. Additionally, the ensuing uniform data shape is primed for writing quick queries delivering rich, nuanced feature engineering. Within the data fabric framework, entity event modeling is instrumental to perfecting machine learning models with accurate predictions that significantly boost this architecture’s ROI.
Entity Event Modeling
It’s difficult to outstrip the ease of EEM or its flexible utility. Unlike complex data warehouse schema, it consists of just two objects: an entity and events. Entities are the concepts businesses are centered on, like customers in finance. Events are anything that pertains to those customers, like when they opened accounts, made payments, received mortgages, etc. With this combination, organizations can describe everything about the people or things driving their businesses. Because this schema is based on semantic technology standards, each entity has a universal identifier in its events.
Therefore, no matter where these events are recorded (in databases or cloud settings internal or external to organizations), they’re a) associated with a specific entity or customer, and b) readily mapped into a central repository, like a knowledge graph. The unparalleled expressiveness of this approach is in its sub-events. For example, one event might be a patient who entered a hospital. The first sub-event could be a test performed on him/her. The second sub-event could be a procedure performed on the patient because of the test’s results. With this approach, organizations can model everything about pertinent entities while readily centralizing them for their Data Science.
The brilliance of this approach for feature engineering directly relates to the querying capabilities it supports. The uniformity of the underlying EEM results in far shorter queries than are required when data have different schemas in various silos. Thus, organizations can issue more queries faster than they otherwise could for the right data to facilitate the predictions of machine learning models. If a contact center wanted to generate features for a machine learning model illustrating which new product capabilities would resonate with customers most, it could issue a query to see which customers discussed which features for a specific product line. Using SQL in relational settings, this query would take three pages and days of development. Leveraging the standards-based approach of EEM, it would require only three or four lines!
This advantage is also supported by the fact that it’s easier to collocate data (with common terminology and models) because of the entity event approach. There are additional boons for customer 360 views. Because events have a universal identifier for each customer, organizations can issue curt, swift queries detailing a customer’s entire journey. These results are critical for identifying features for building AI models for developing new products or services, identifying the likelihood of churn, detailing upselling opportunities, or learning anything else germane to a particular business.
Scalability and Beyond
Because it scales so well, the entity event model is also helpful for performing the underlying computations for feature engineering queries. Since semantic statements about patients in entity event trees (comprised of multiple events and sub-events) have unique identifiers, data scientists can leverage federated sharding methods to distribute data among different machines for robust query performance at scale.
All these characteristics of this modeling approach — its simplicity, ability to map disparate data for centralized access, uniformity, expressivity, and querying benefits — substantially enhance the most dominant use case across industries today: AI and machine learning. Reducing churn, creating useful features for products and services, and pinpointing cross-selling opportunities are tangible ways in which organizations can improve their ROI for data fabrics enhanced by knowledge graphs.