Data Is Risky Business: Securing the Human(ity)

Data Is Risky Business is a TDAN column published every quarter on DATAVERSITY.

There is a principle in the martial art aikido that you cannot truly learn a technique without feeling it applied to you. The ukemi, the art of receiving the technique, is not incidental to the learning. It is the learning. You must experience disequilibrium, feel your balance break, and find your way back to center. That friction is what builds the felt sense of “wrong” that allows you, eventually, to perform “right.”

I have been thinking about this a great deal as I watch our profession enthusiastically automate away precisely the experiences that teach people what “wrong” feels like in data work.

In my September 2025 column I introduced what I called the Pakled Problem – the risk that organizations adopting generative AI and agentic automation at scale would find themselves operationally dependent on technology they lacked the human capital to interrogate, correct, or govern. The argument was strategic: The traditional learning ladder is being dismantled, and the balanced scorecard implications for learning and growth are being systematically under-weighted in AI business cases.

This column goes one level deeper. The question is not just “what do we lose?” but “what specific competences are forged in the friction that automation is eliminating – and how do we rebuild them deliberately?”

Data Architecture Workshop

Learn how to design unified, future-ready data architectures that bring together operational, analytical, and AI data – December 1-2, 2026.

Enroll Now

Two Kinds of Expertise

Data work is fundamentally a cognitive activity. When an analyst notices that a record count doesn’t match an expected total, they are not just running a query. They are applying a contextual mental model of what “normal” looks like for that dataset, drawing on prior experience of where quality issues tend to cluster, and formulating hypotheses to test. This is situated cognition – knowledge distributed across the person, the artifacts, and the specific problem context.

The cognitive psychologists Hatano and Inagaki drew a crucial distinction between two forms of expertise. [1] Routine expertise is the capacity to perform known procedures with speed and accuracy in familiar contexts. Adaptive expertise is the capacity to construct new procedures when familiar ones fail or the context changes. Both matter. But in data governance, it is adaptive expertise that determines whether a person can actually govern data rather than merely execute a governance process.

Routine expertise gets a report produced. Adaptive expertise notices that the report, though technically correct, is being misread or misinterpreted. Adaptive expertise also knows what to do about it, or at least how to go about figuring that out.

Routine expertise is knowing that a tomato is a fruit. Adaptive expertise is knowing you can’t substitute other fruit as a base for your pizza sauce.

The uncomfortable truth is that the tasks being automated in the current wave of AI adoption are almost exclusively those routine expertise tasks that have historically served as the training ground for adaptive expertise. We are trying to train pilots on a simulator that never crashes and never presents unexpected weather.

What Friction Actually Teaches

To understand what we risk losing, we need to be specific about the kinds of friction that have historically built competence in data work.

Reconciliation friction: The process of manually reconciling data between source systems by comparing counts, investigating discrepancies, or tracing an element back through its transformation journey, builds an embodied understanding of data lineage and quality that no automated lineage diagram replicates. The analyst who has spent three hours tracking why a figure is off by 1.3% across a financial close has developed tacit knowledge: they know more than they can tell. They have a feel for where the bodies are buried.

Definition friction: The process of agreeing business definitions that age old argument about what counts as a “customer” (or what counts as a “presenter” in TV network salary reporting), whether a lapsed account holder is a “client,” how “revenue” should be recognized across product lines is often treated as governance overhead to be minimised. But it is one of the most important formative experiences a junior data practitioner can have. It teaches them that data has no inherent meaning; that meaning is socially constructed in organizations (the lingua franca of the organisation as Tom Redman puts it). Arguably the real value of a data dictionary lies not in its existence but in the quality of the conversations that produced it. You cannot automate your way to that understanding.

Error investigation friction: When a data quality rule fails and a practitioner has to investigate root cause by tracing it through processes, asking questions, and understanding business context, they are building systems thinking capability applicable to governance, risk, and process improvement. The automated quality dashboard that flags issues without requiring investigation is, paradoxically, an engine for eroding the very capability needed to address those issues at source rather than at symptom.

The Data Modeling and Metadata Problem

Perhaps nowhere is the tension between automation and expert judgement more acute than in data modeling and metadata management, two disciplines that sit at the heart of sustainable data governance and which are increasingly important to the development of semantic layer to help improve LLM and AI accuracy.

Large language models are genuinely impressive at both. Give an LLM a database schema and it will produce plausible entity descriptions, candidate business definitions, and draft data dictionaries at a speed no human team can match. Give it a set of data samples and it will suggest likely data types, formats, and classification schemes. Ask it to suggest the types of entity that a particular type of organization might need to know about, it can produce a good first pass version. For bootstrapping a metadata catalog or generating a first-pass conceptual model from source documentation, LLMs are a legitimate productivity multiplier.

But here is what LLMs cannot do: They cannot know that the field labelled “customer_id” in the CRM system is not the same entity as the field of the same name in the billing system, because the billing system was acquired in 2019 and never fully integrated, and the operations team has been manually maintaining a crosswalk in a spreadsheet that nobody told the data architecture team about. They cannot know that the agreed definition of “active policy” was contested for 18 months and that the current definition reflects a specific regulatory interpretation made after a compliance review, and that there is a minority view in the actuarial function that the definition is still wrong. They cannot know that the metadata on the data catalogue is aspirational rather than operational and doesn’t accurately describe how the data is used or how the system or process works on the ground.

These gaps are not failures of the LLM. They are the natural consequence of the fact that expert context in data modelling and metadata management is not contained in the artefacts. It is carried in the heads of the people who argued about those artefacts, made decisions about them, and lived with the consequences. That expert context is the accumulated residue of years of definitional friction, reconciliation friction, and error investigation friction. Ideally it is documented. But data debt is a tangible problem, often driving the adoption of AI and GenAI tools to help resolve it.

When we automate the generation of data models and metadata, we get faster first drafts. But if the practitioners responsible for reviewing and validating those drafts have never had to build a data model from scratch, have never had to sit with a subject matter expert and reverse-engineer the business logic embedded in a legacy system, then they will not have the depth of experience to know what the LLM got wrong. Bainbridge’s Ironies of Automation apply with particular force here: the more capable the automated tool, the more it demands of the human reviewer, and the less prepared that reviewer is to provide meaningful oversight. [2]

Ashby’s Uncomfortable Implication

W. Ross Ashby’s Law of Requisite Variety holds that an effective controller of a system must have at least as much variety – as many possible responses – as the system it is trying to control. [3] Data environments are complex and variable: Quality issues evolve, regulatory requirements shift, the ways data is used and misused change continuously. A governance practitioner needs sufficient variety of knowledge and experience to respond effectively.

What automation does is reduce the apparent variety of problems the practitioner faces day-to-day. This sounds beneficial. But if we reduce the variety of problems the practitioner engages with, we also reduce the variety of responses they develop the capacity to make. This is occurring precisely as the overall data environment, complicated by GenAI-generated content, AI-mediated decisions, and agentic processes operating at scale, is increasing in complexity. We are narrowing the practitioner’s capability at exactly the moment the environment is widening its demands.

Planning for Adaptive Expertise

The original Luddite movement was not, as popular myth has it, simply opposed to machinery. It was a campaign by skilled craft workers to preserve the knowledge and capability embedded in their trades against the deskilling effects of mechanization. In that sense, the argument I am making here is a Luddite one – and I make it without apology. Automating routine data management tasks is not inherently problematic. The problem is treating that automation as a free gift – as pure efficiency gain with no corresponding cost to human capital development.

The strategic challenge for data leaders is this: as we automate procedural expertise, we must plan deliberately for how we will develop adaptive expertise in the people. This is not a training problem in the conventional sense. It cannot be solved with an e-learning module or a certification program. It requires a fundamental rethink of how junior and mid-level data practitioners develop their professional formation.

What should this look like? Three practical interventions are worth considering.

First, treat definitional and modeling conversations as development activities, not overheads.The arguments about what a data element means, how an entity should be modelled, what a metadata attribute should capture need to be treated as structured learning experiences for junior practitioners, not inconveniences to be accelerated by an AI tool. Use LLM-generated first drafts as the starting point for those conversations, not the end point.

Second, require practitioners to understand before they review. If entry-level roles are increasingly “reviewing AI outputs,” organizations need to invest in ensuring that reviewers have sufficient contextual knowledge to review meaningfully. This means deliberate exposure to the provenance of data, the history of process design decisions, the rationale behind governance choices. It means building the mental model of “normal” before asking someone to identify “abnormal.” If we do not develop this, we risk staff deferring to the machine. The “human in the loop” needs to have the capacity to meaningfully intervene, so they need to have the capability to engage in appropriate sensemaking when presented with a question or challenge or unexpected output.

Third, build a human factors lens into data governance competence frameworks. Existing data management competence frameworks are rich in technical skills but thin on the adaptive, contextual, and interpersonal competences that make governance function in practice. The profession needs to name and develop these explicitly: The ability to read organizational context, exercise situated judgement, and know when the automated system is telling you something that doesn’t smell right.

The Harder Work

Our profession has argued for nearly three decades that data governance is a human and organizational challenge rather than a technical one. We are now at risk of losing the argument from the inside. By allowing the automation of routine data work we risk hollowing out the human competences that make meaningful governance of data and AI possible.

The central strategic imperative is straightforward, even if the execution is hard: The automation of procedural expertise must be accompanied by a deliberate, structured programme to develop adaptive expertise in people. Not as a compensatory afterthought, but as an integral part of every AI and automation business case. Not as a training budget line, but as an investment in the organizational capability to interrogate, challenge, and govern the systems we are building.

LLMs can produce a data model in minutes. What they cannot produce is the practitioner who knows why that data model is subtly wrong for this organization, in this context, given this history. That knowledge is still built the old-fashioned way: through friction, through failure, through the hard and undervalued work of sitting with a problem until you understand it.

Microsoft’s recent announcement that Github CoPilot is moving to usage-based billing for tokens may be a signal that preserving both procedural and adaptive expertise might be a smart strategy within an overall AI strategy.

As always, the opportunity exists between the keyboard and the chair.

Notes and References

[1] Hatano, G. & Inagaki, K. (1986). Two courses of expertise. In H. Stevenson, H. Azuma & K. Hakuta (Eds.), Child development and education in Japan (pp. 262–272). Freeman. The distinction has been further developed in Schwartz, Bransford & Sears (2005) and is increasingly relevant to professional development in complex knowledge work.

[2] Bainbridge, L. (1983). Ironies of automation. Automatica, 19(6), 775–779. Bainbridge’s central argument — that increasing automation raises the cognitive demands on human operators while simultaneously degrading the skills needed to meet those demands — applies with particular force to AI-assisted data modelling and metadata management.

[3] Ashby, W.R. (1956). An Introduction to Cybernetics. Chapman & Hall. The Law of Requisite Variety has broad application in organizational and governance theory. Its implications for practitioner capability development in automated data environments are significantly underappreciated in the current literature.

Applied Data Governance Practitioner Certification

Validate your expertise and take your career to the next level.

Learn More

Data Is Risky Business: Securing the Human(ity)

Data Architecture Workshop

Two Kinds of Expertise

What Friction Actually Teaches

The Data Modeling and Metadata Problem

Ashby’s Uncomfortable Implication

Planning for Adaptive Expertise

The Harder Work

Notes and References

Applied Data Governance Practitioner Certification

Daragh O Brien

The Data Modeling Gap Undermining Enterprise AI

Why Your LLM Needs an Onboarding Program

AI Readiness Starts with Data Governance, Not Data Access

Thanks!

Data Is Risky Business: Securing the Human(ity)

Data Architecture Workshop

Two Kinds of Expertise

What Friction Actually Teaches

The Data Modeling and Metadata Problem

Ashby’s Uncomfortable Implication

Planning for Adaptive Expertise

The Harder Work

Notes and References

Applied Data Governance Practitioner Certification

Daragh O Brien

Related Articles

The Data Modeling Gap Undermining Enterprise AI

Why Your LLM Needs an Onboarding Program

AI Readiness Starts with Data Governance, Not Data Access

Lead the Data Revolution from Your Inbox.

Thanks!