Cloud giants like Google and Snowflake, unicorns like dbt Labs, and a host of venture-backed startups are now talking about a critical new layer in the data and analytics stack. Some call it a “metrics layer,” or a “metrics hub” or “headless BI,” but most call it a “semantic layer.” I prefer to call it a “semantic layer” because it best describes a business-friendly interface to data that serves a variety of use cases and user personas.
A semantic layer makes data usable for everyone and presents a consistent, business-friendly interface to corporate data. A semantic layer also:
WANT TO STAY IN THE KNOW?
Get our weekly newsletter in your inbox with the latest Data Management articles, webinars, events, online courses, and more.
- Connects users to live data, of any shape and size, wherever it landed
- Delivers queries at the “speed of thought” on any size of data
- Governs user access to sensitive data for every query, regardless of the tool used
- Connects and blends data across silos from on-premise to cloud to SaaS applications
- Bridges the business and data science teams by integrating historical and predictive data
In the following sections, we’ll discuss the core capabilities of a semantic layer platform that you can use as a guide when evaluating vendors and solutions.
A semantic layer platform needs to deliver on seven main vectors of value. The following diagram illustrates the core capabilities of a semantic layer:
A semantic layer needs to be truly universal. This means a semantic layer must support a variety of use cases and personas including business analysts, data scientists, and application developers. It also needs to support a wide range of query tools using their native protocols including SQL, MDX, DAX, Python REST, JDBC, and ODBC.
The core of the semantic layer is the data model. A semantic layer maps the logical elements (dimensions, metrics, hierarchies, KPIs) to the physical entities of databases, tables, and relationships. In order to deliver a digital twin of the business, a semantic layer must support reusable models and components to drive a hub and spoke (data mesh) analytics management style backed by a CI/CD compatible markup language and GUI-based modeling environment.
The semantic layer data model must be backed by a scalable, multi-dimensional engine to express a wide range of business concepts in a variety of contexts. The semantic layer engine must support matrix-style calculations (time intelligence, multi-pass, etc.) using a multidimensional expression language like MDX or DAX and query underlying cloud data platforms “live” without data movement or a separate data store.
Without query acceleration, a semantic layer will likely be bypassed using BI tool extracts and imports, which defeats the purpose of a semantic layer. As such, a semantic layer must automatically tune and improve performance using machine learning and user query patterns without moving data outside the native cloud data platform or requiring a separate cluster for managing aggregates.
A semantic layer needs to satisfy a wide range of data governance scenarios. It must integrate with corporate directory services (i.e., AD, LDAP, Okta) for user identity management, apply row-level security to every query and be able to hide and mask data columns based on user, group, and role-based (RBAC) access data rules.
Data lives in multiple silos, including on-premise, legacy data warehouses, data lakes, cloud data warehouses, and SaaS applications. A semantic layer must be capable of accessing and modeling data across these multiple sources and support a variety of data types including nested data like JSON.
A universal semantic layer is quickly becoming a critical component in a modern data and analytics stack. However, when evaluating semantic layer options, it’s important to keep one thing in mind: If any of the above requirements is missing, a semantic layer is unusable. In other words, it’s binary – it either works 100% or it doesn’t work at all. Don’t let this be an impediment, though, because a universal semantic layer makes everyone a data-driven decision-maker.