Traditional enterprise data architectures rely on moving data into centralized lakes or warehouses, which creates complex pipelines and consistency issues. This article proposes a paradigm shift: Agent-centric data architecture. Instead of consolidating data, intelligent agents bring processing and analysis directly to the source systems.
The Three Pillars
- Declarative Modeling: Systems define what data is needed; agents figure out how to retrieve it.
- Robust Governance: Metadata and shared semantics ensure traceability and control directly within operational systems.
- Data Fabresh Layer: A governance layer that regulates contracts and policies rather than storing data.
By prioritizing governed intelligence over storage, this model eliminates the rigid layers of traditional architecture. It offers a real-time, simplified infrastructure suited for a distributed, AI-driven world – evolving beyond centralized repositories to focus on data at its point of origin.
Live Online Course: Data Architecture Intensive
Learn how to design unified, future-ready data architectures that bring together operational, analytical, and AI data.
“And Yet It Moves”
For many years, data architecture in organizations has followed a fairly stable logic: extracting information from operational systems, transferring it, and concentrating it on platforms designed for analysis. Over time, this approach has taken different forms – first data warehouses, then data lakes, and more recently, lakehouses – but the fundamental premise has remained unchanged. All these models assume that, in order to analyze data, it must be extracted from the systems where it is generated and transferred to a separate environment optimized for its use.
However, recent advances in computing power, virtualization, and distributed processing challenge this basic idea: What if it were no longer necessary to move data in order to analyze it?
The exponential growth in computing power, the maturity of distributed architectures, and the emergence of intelligent agents capable of orchestrating data processes make it possible to envision a new paradigm: agency data architecture.
In this model, data does not move. What moves is intelligence.
The End of the “Copy Paradigm”
Moving data has always been costly and problematic. Every pipeline, every ETL, and every replication introduces additional complexity:
- Unnecessary latency
- Consistency issues
- Duplicate storage costs
- Security risks
- Deterioration of data quality
Furthermore, each copy raises an inevitable new question: Which is the correct version of the data?
Traditional architecture attempts to solve these problems by centralizing data. But the result is often an ecosystem riddled with intermediate layers:
- Data warehouse
- Data lake
- Data marts, cubes
- Operational replications
- ETL/ELT pipelines
Paradoxically, the more we try to consolidate data, the more complex the architecture becomes.
Agent-based architecture proposes a radically different approach:
Data must remain where it is generated: in the operational systems.
From Data Transfer to Computing Power Transfer
The key shift is based on a simple technological premise: Today, it is cheaper to move the processing than to move the data.
Instead of building large centralized data warehouses, an agent-based architecture allows intelligent data agents to perform analytical processes directly on the source systems.
These agents can:
- Access operational data in real time (with sentinels monitoring for anomalies)
- Apply quality rules at the source (within processes)
- Perform on-demand transformations (minimal if the data is of high quality)
- Orchestrate federated queries across multiple systems (including in a virtualized manner, with queries moving instead of data)
- Generate dynamic data products (in a secure, governed-by-design environment)
The result is an architecture where analysis comes to the data, rather than bringing the data to the analysis.
Upskill Your Team — At Scale
Get unlimited access to 250+ courses, 600+ hours of insider content, and enterprise discounts for team-wide learning.
4.0 The True Cornerstone: Data Governance
This model is feasible only if there is an extremely robust level of data governance.
Without governance, direct access to operational systems would be chaotic.
With governance, it becomes an extremely powerful architecture.
The governance framework must ensure:
1. Data quality at the source: Quality rules are not applied after ETL, but within operational processes.
- Data is created already validated.
2. Shared semantics: A business metamodel defines:
- Entities
- Definitions
- Business rules
- Relationships between data
This allows agents to interpret information correctly.
3. Access control: Agents must operate in compliance with strict policies regarding:
- Security
- Privacy
- Traceability
- Regulatory compliance
4. Complete audit trail: Every query or result generated by an agent maintains a complete audit trail back to the operating systems.
There are no intermediate layers that hide the origin of the data. The data is not exposed; it is queries that travel through the systems, faster and more securely.
The Role of Data Agents
In this architecture, the key players are what we can call “data agents.”
A data agent is an autonomous component capable of:
- Understanding the semantic context of the data from a transactional perspective
- Identifying relevant sources
- Executing distributed queries (using virtualization and caching engines to avoid overloading the operating systems)
- Applying business rules
- Generating analytical results instantly and without ETL
We can envision several types of agents:
- Discovery agents: Identify where relevant information is located. (We already have these.)
- Quality agents: Verify and correct anomalies in real time. (They are already here.)
- Analytical agents: Generate metrics, indicators, or models directly from operational data.
- Governance agents: Ensure compliance with data usage policies.
This ecosystem creates a network of distributed intelligence that operates on existing systems.
A Radical Change in Architecture
Agent-based data architecture implies a profound conceptual shift.
Traditional architecture:
- Move data
- Store data
- Transform data
- Consume data
Agent-based architecture:
- Govern data
- Discover data
- Perform data processing
- Generate knowledge on demand
The difference is fundamental.
Instead of building massive archives of historical data, the organization creates an intelligent network of governed access and processing.
Benefits of the Model
When implemented correctly, this approach offers significant benefits.
- Elimination of duplicates: A single set of data in the source system.
- Real-time information: No ETL delays or replicas.
- Reduced infrastructure costs: Less storage space, fewer pipelines, less maintenance.
- Greater traceability: Every result can be traced back to the process that generated the data.
- Better quality: Data is validated at the time of creation.
- No MDM required: Governed processes do not fragment information; they are all Golden Records.
From the Layered Model to Agent-Based Architecture
This approach also breaks with another of the classic pillars of data architecture: the separation into conceptual, logical, and physical layers. For decades, enterprise architecture has sought to structure information systems through this layering, in which each layer abstracts the next. However, in an agentic data architecture, this distinction loses its relevance.
Agents operate directly on the systems where the data resides, using metadata, semantics, and governance rules to interpret information in real time. Conceptual understanding, business logic, and physical execution are no longer separated into rigid layers, but become part of a single dynamic system governed by metadata and declarative policies. In place of a static, layered architecture, a living, distributed, and contextual architecture emerges, in which agents directly link intention, meaning, and execution on operational data.
The Challenges
This model is not without challenges. The main one is cultural and organizational in nature.
Many companies have built their data strategy around large centralized repositories. The transition to a distributed architecture requires:
- Maturity in data governance
- Clear definition of domains
- Robust semantic models
- Advanced automation
In addition, operating systems must be able to support additional analytical workloads, which is not always the case in legacy architectures. However, it is possible to virtualize them and make intelligent use of caching in virtualization engines.
The Future of Data Architecture
Agent-based architecture does not necessarily imply the immediate disappearance of data warehouses or data lakes. Many organizations will continue to use them for years to come. Many organizations need to materialize data, if only for regulatory reasons.
But the trend is clear. As distributed computing, data virtualization, and intelligent agents evolve, the model of massive data movement will lose its relevance.
The future may not lie in building ever-larger repositories, but in creating systems capable of processing data where it is generated.
In that world, the question will no longer be, “Where do we store the data?” but rather, “How do we govern the intelligence that operates on it?”
And this is, precisely, the promise of agent data architecture.
The Role of the Declarative Approach
For this model to work, we must also move away from the classic procedural paradigm of data engineering.
For years, data architectures have been built by defining how processes should be executed: step-by-step ETL pipelines, scripts, chained transformations, and complex orchestrations.
In an agent-based architecture, the approach shifts toward a declarative model.
Instead of specifying how to move and transform data, what is defined is:
- What data is needed
- What rules it must satisfy
- What quality is acceptable
- Which governance policies must be applied
In other words, you declare the desired state of the data, and the system’s agents automatically figure out how to achieve it.
This is a profound shift: the data engineer stops programming pipelines and moves on to defining contracts, rules, and objectives, rising to the business level.
Data agents, supported by metadata and semantic models, perform the actions necessary to fulfill these declarations. As always, the success of the model lies in the metadata.
The Role of Data Fabresh: Guardians of Processes, Not Data
In this context, an interesting evolution of the traditional concept of the data fabric emerges: what Daniel Torbellino and I call Data Fabresh.
While many data architectures seek to build a layer that controls or centralizes data, the Data Fabresh approach is different.
Data Fabresh does not store data. It governs processes.
Its primary function is to act as a layer of orchestration, governance, and control that ensures data agents and processes are executed correctly on operating systems.
Instead of becoming a new repository, Data Fabresh acts as:
- A custodian of quality rules
- Guarantor of data contracts
- An orchestrator of declarative processes
- Controller of provenance and traceability
- Manager of access and compliance policies
In this way, the focus is no longer on where the data resides, but on how the processes that use it are governed.
Conclusion
Agent data architecture represents a profound paradigm shift: moving from the movement and replication of data to its processing directly where it is generated, using a declarative approach governed by intelligent agents. This model allows us to work with operational data with guaranteed quality, eliminates duplication, reduces latency, and transforms the way we understand architecture, integrating the conceptual, logical, and physical aspects into a dynamic, distributed system.
Applied Data Governance Practitioner Certification
Validate your expertise – accelerate your career.

