Article icon
Article

Why Your Semantic Layer Will Make or Break Your AI Strategy

In every single company I have consulted in the last five years, the message has been the same: We are putting a lot of money into AI. In addition, most of these companies are building their AI on top of a data layer that they would not use for compliance reporting, let alone autonomous decision-making.

This is not a technology issue. This is a governance issue disguised as an AI strategy. At the heart of all this, failing silently, is the semantic layer – or rather, the lack thereof, designed specifically for AI requirements.

Let me be frank about something you will not hear in vendor presentations: Your AI strategy is no better than the data layer underneath it. If that data layer is uncontrolled, loosely defined, and geared for humans’ pace of consuming BI, your AI will happily churn out incorrect results at machine speeds. This is a much worse result than a sluggish dashboard.

AI Governance Comprehensive

Gain the practical frameworks and tools to govern AI effectively.

What Is the Semantic Layer, and Why Do Most Enterprises Get It Wrong?

The semantic layer is the intermediary between data and meaning. It is where the definition of “revenue” lives, where customer identity is reconciled across applications, where the rules that drive calculations exist. In a properly governed enterprise, it is the definitive source of truth for any report, dashboard, or increasingly AI agent.

For most enterprises, however, the semantic layer is an afterthought. It is cobbled onto a data warehouse by the end of the 2000s, partially migrated to the cloud by 2019, and never brought together in one place. Definitions vary. Metrics diverge between departments. The exact same field can mean something entirely different based on what system is queried. This is common knowledge among data teams, who have learned to compensate for the problem. Language models have not.

A business analyst querying an improperly governed semantic layer would recognize the discrepancy and ask another question. An AI agent will use conflicting data to form a cohesive, fluent response and pass it off as fact. The human check that prevented this mistake for decades becomes obsolete the moment AI comes into play.

The Real Cost of Building AI on an Ungoverned Foundation

This phenomenon has occurred in my experience in retail, government, and banking organizations. The pattern remains constant. A conversational AI or a RAG-powered analytics assistant is rolled out. The first demonstrations wowed the stakeholders. More usage of the system follows. Then the questions come rolling in: Why is it that the AI said we had X sales in Q3 when the financial reports say Y? How could the AI suggest a product that was discontinued eight months ago? Why do different queries on the same question generate different responses from the AI?

In all of these cases, the problem stems from the same source. The AI queries data that was never intended to be machine readable. The semantic layer existed in an era when there was a human intermediary between the input and output. With the human layer removed, the weaknesses in the architecture show up as output.

The problem goes beyond just providing inaccurate information. It is a loss of trust – and once trust is lost in an AI system, it is extremely hard to regain. The business users who receive three inaccurate results from the AI will simply stop using the system. They’ll fall back on their spreadsheets. And the entire AI project will slowly die in a governance gap nobody wants to own.

What an AI-Ready Semantic Layer Actually Requires

The development of a semantic layer that is ready for AI consumption differs significantly from creating one for traditional BI and poses a few additional challenges to data management professionals.

First of all, the definitions should be precise and interpretable by machines. It is evident for a person working with data that “active customer” has a distinct meaning in marketing and in finance. An AI agent won’t understand that distinction unless it is made explicit in the semantic model. Every metric, every dimension, every business rule should be precisely defined – just like in a legal document.

Secondly, lineage should be traceable throughout the entire process. When your AI model gives an answer, your governance platform should be capable of identifying precisely which data sources were used, what transformations were performed, and which business rules were applied. You can only audit your AI models if you know their lineage; you can only govern them if you can audit them.

Thirdly, role-based access control should be applied to AI consumers as well as to humans. Having worked on the implementation of governed analytics platforms on Microsoft Fabric and Power BI, I have seen how frequently the most underappreciated risk is the excessive access that AI agents inherit from the retrieval pipeline. If your retrieval pipeline can query any data in the semantic model, it will. And if your end user requesting the data wasn’t supposed to see it, you have an issue on your hands.

Finally, a semantic layer should be version-controlled and managed accordingly. The changes to metrics and dimensions in traditional BI are slow, human-led processes. When you start using AI, the changes will become invisible, will affect thousands of responses, may change the behavior of production agents, and might even cause compliance issues before anyone becomes aware of it. Take your semantic model seriously and treat it like production code.

The Uncomfortable Opinion Most Vendors Will Not Share

Here’s my take on the topic, informed by two decades of enterprise data systems experience: Most organizations aren’t yet ready to run their production AI on their existing data stack. Not because the AI itself isn’t mature enough – it is – but because the supporting data governance infrastructure has not caught up to the ambitions of the AI strategy.

The companies that will succeed with AI won’t be the ones that rush to implement their AI models. Instead, they will be those that have done the necessary groundwork first in semantic governance before scaling AI adoption. They will define their metrics once and consistently enforce those definitions across the organization. They will have designed lineage into their data platform from day one, rather than bolting it on as an afterthought for audit purposes. And they will treat their semantic layer as a strategic asset, rather than just a business intelligence tool.

This may not be a message that will get you excited at the next AI steering committee meeting. But it’s the message that will determine whether your AI efforts yield business results or costly hallucinations.

Where to Start

If you are a data management professional who just picked up this blog post, then the discussion you should be having at the moment isn’t “which AI platform do we adopt?” Rather, it’s “can we describe our twenty most important business metrics in a manner that would be interpreted the same way by any system, including our AI agent?”

If the answer is “no” or even “it depends,” you have identified your readiness gap when it comes to artificial intelligence. That should come first; the technology can wait.

I would be interested in learning how other professionals in data management are tackling this challenge. Is your organization already focusing on semantics before rolling out AI solutions, or realizing the necessity of semantics once the AI solutions have been deployed?

Data Governance Bootcamp

Learn strategies for planning, designing, and sustaining successful data governance programs – October 6, 13 & 20, 2026.