More data, more sources, more conflicts. More self-service reporting, more cross-functional analytics, more government and industry regulations. Without a way to govern all the data in their possession, businesses are not going to do well at creating reports or dashboards or building data consensus across departments. Nor will they feel confident that they’re not missing the mark on GDPR or other privacy mandates that carry stiff penalties. Nor can they expect that their efforts to transform into a digital business with strong Data Governance will succeed.
But how does a growing business “do” Data Governance? Experts generally advise:
- To shift the culture so that employees are more focused on data consistency and accuracy within their own departments;
- To create and ensure that metadata is a priority;
- And to start with a small program with a defined business outcome in mind.
Don’t neglect a Data Governance framework for strategic data planning tasks, and it may help to bring in a Chief Data Officer (CDO) to be at the helm, too. Technologies that can be deployed in support of a Data Governance program include data catalogs, Master Data Governance, and cloud-based services.
Enterprises also may consider semantic and Enterprise Knowledge Graph technology for their Data Governance efforts. Enterprise Knowledge Graphs describe a linked set of information that meaningfully brings together data and metadata silos.
“The level of definition you can put around a customer, a supplier, or a partner using a semantic model or graph-based semantic model is much more robust than what you can do just using traditional definitions, like listing out name, address, etc. …The biggest difference there is how you capture relationships, which are critical to how you take action around information around that customer or partner.”
The Data Governance market is pushing ahead at a brisk pace to be sure, and the impetus for that growth is the fact that the world’s data volume is expected to grow by 40% a year. According to Data Governance Market – Growth, Trends and Forecast, a new report from Research and Markets, the Data Governance market should reach a value of $4.35 billion by 2024 as businesses look to raise data quality and accessibility and reduce data duplication and loss.
Diving into a Knowledge Graph Solution
TopQuadrant’s TopBraid EDG solution for Data Governance is a modular, integrated knowledge-graph-based Data Governance technology that is based on W3C standard RDF graphs. EDG is semantic and it crosses silos of enterprise information. It includes close to 400 pre-defined asset types – such as business area, organization, glossary term, data set – described by ontologies and resident within their specific domains. ETL scripts, for example, are cataloged in technical asset collections. Having this foundation of a large number of pre-built asset types means that customers can deploy EDG faster and with less expense.
The solution is designed to support a comprehensive but staged approach to governance, letting users pick from a modular assortment of Data Governance packages (vocabulary management, metadata management, reference data, business glossary) that each combine asset types that work together. They exist as part of a knowledge graph asset collection that represents a knowledge domain in an operational governance model where data and metadata are similarly represented and connected. Graphs can be enriched with enterprise-specific knowledge and integrated together.
Each asset collection, such as a specific glossary or data catalog, is created as its own graph, and collections can merge with each other forming larger assemblies of graphs.
Data Governance has evolved from the question “Why does a business need it?” to “How can it be applied?” That’s according to Jesse Lambert, Senior Semantic Solutions Architect at TopQuadrant, during a recent DATAVERSITY® interview.
“But they don’t want a Ferrari to start – they need training wheels,” Lambert said. Start with a controlled vocabulary: “We tell customers you need a common language. Use the business glossary and get onboard.”
A backbone of TopQuadrant’s solution is SHACL – Shapes and Constraints Language – a W3C standard language whose syntax is RDF for creating data models, data validation, and reasoning rules. An advantage of using RDF and associated languages for knowledge graphs is that models and rules are just as much a part of the knowledge graph as the data facts. “SHACL lets you enforce the common language” for a common understanding of data across the organization, said Lambert. A common language is discoverable and socialized across the enterprise, bringing together the data one department calls “product” and another calls “solution,” for example. Resources and relationships are unambiguously identified.
The Enterprise Knowledge Graph, explained TopQuadrant co-founder and CEO Irene Polikoff, gives the unique identifier for something like a customer, but it is possible that a word like “customer” has different meaning in different contexts with different identifiers, and their identities must be preserved. The EDG knowledge graph’s use of an RDF data model and standard where each resource has a unique URI means that the identity and meaning of a resource are not determined by label.
Users can specify how each of these resources relates to each other and to other things in the enterprise. Connections in the knowledge graph can still be created for two resources in two different asset collections that have different standard vocabularies or between standard and specialized local vocabularies by mapping reference datasets.
TopBraid EDG collects metadata from all data integration environments to build a knowledge graph with visibility, control, and intelligence to manage change and build connections across metadata silos.
“You need metadata expressed in some common way to figure out if the data you are using for business analytics or reports or compliance is the right data brought together in the right way,” Polikoff said.
Knowledge Graphs Get into AI, Too
In addition to knowledge graphs governing data, they can also enable enterprise AI, according to TopQuadrant. Knowledge graphs that capture the meaning of data and its semantics are part of the knowledge representation branch of AI, enabling computers to reason based on fully available contextual and conceptual information. SHACL provides the way to specify rich rules that infer new facts from available data. “Rules,” said Polikoff, “are the brain.”
With standards-based SHACL, you can feed in business logic in ways you couldn’t before, said Lambert. For example, it becomes possible for data to learn from rules whether or not a person can get a loan. “If you have a model or rule, you can guide a machine learning application to learn who should get loans based on patterns in their income or credit history data and not based on their gender or race,” he said. That’s a better approach than to let algorithms make conclusions from facts that may be incidental in data. Or, even worse, facts that are illegal to use in decision making.
Growth in the enterprise use of AI brings advantages as well as risks and challenges. As part of a “virtuous cycle,” governance of the use of AI will be required to address risks and ensure effectiveness of AI-based solutions. That means that a way will be needed to:
- Manage the training of datasets
- Combine data across heterogeneous data sources to provide data objects as training data sets
- Capture what AI algorithms are being used for what purposes
- Understand and evaluate the usefulness of results delivered by different AI algorithms.
TopQuadrant released Version 6.2 of TopBraid EDG in May. Enhancements include integration with external knowledge graphs such as Wikipedia, extended scope of data lineage and impact analysis with support for multiple types of data flows and their related business entities, and the ability for the solution to learn likely business rules from sample data.
Image used under license from Shutterstock.com