You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

What’s The Problem with Big Metadata?

By   /  March 20, 2013  /  4 Comments

by Ian Rowlands

I’m returning to a theme that was one of the motivations for starting blogging with Dataversity. How come so many metadata programs get off the ground, do well for a while and then wither on the vine?

I’ve been focused recently on the issues surrounding what I think of as projects on the “bleeding edge”. (Warning: shameless self-promotion. I’m speaking on the topic at EDW, so I’ve been thinking about the projects I and my colleagues have been involved in recently.) What came to the surface surprised me, and drove me back to the blogosphere.

What I was expecting about the challenges around “Big Metadata” projects was that there would be characteristically special technical issues. To be sure there were a few, but they weren’t really pervasive — but there was something that came up consistently.

I suppose I should say what I mean by “Big Metadata” projects. I haven’t really formalized it, but you can look at it by taking the “Big Data” characteristics — volume, variety and velocity, and applying them to the world of metadata. Of course one part of the model breaks down. A key characteristic of “Big Data” is that the v/v/v challenges force an adoption of distinctive — “Not Only SQL” — technologies, with Hadoop to the fore. The metadata world has no de-facto standard technology, and so I suppose it’s not startling that no “new generation” metadata technology is emerging as preeminent. Some techniques are bubbling up — especially federation — but as I hinted, technology isn’t really the marker.

So what is really making the “bleeding edge” a tough place to be? The answer is, in retrospect, so painfully predictable that I’ve been kicking myself for not predicting it! It’s the amplifying effect that “Bigness” has on discipline and cultural issues.

Now don’t mistake me. Of course there are challenges when the amount of metadata expands beyond the window open to collect it, the variety increases the complexity of lineage projects beyond the scope of conventional graphics and the number of users grows too fast to be able to handle all their requests for help — but all of those things point to the real issue. When the challenges grow, it’s time to shift from project mode to program mode. It’s a whole new level of management challenge.

The other thing that seems to have been common to most of the recent “Big Metadata” projects is that they’ve involved the bridge from departmental to corporate implementation, and it’s the politics that start the blood flowing!

So there you have it. Sure “Big Metadata” projects have special technical challenges – but if our experience is typical, it’s the lack of discipline and the cross-discipline cultural confusion that will give you the bloody nose!

About the author

Ian Rowlands is ASG’s Product Marketing Manager (Data Intelligence). He heads product marketing for Metadata Management and is also tasked with providing content across ASG’s entire portfolio. Ian has also served as Vice President of ASG’s metadata product management and development teams. Before ASG, Ian served as Director of Indirect Channels for Viasoft, a leading Enterprise Application Management vendor that was later acquired by ASG and managed relationships with distributor partners outside North America. He has worked extensively in metadata management and IT systems and financial management, and presented at conferences world-wide, including DAMA and CMG.

  • The problem with big metadata is that it is based on a false premise. It assumes that, fundamentally, every word has a single global denotation, and that every object has a single word that denotes it, and that any deviations from these standard denotations are trivialities or the result of lack of education or discipline that can be managed away.

    But, in fact, there is no one fixed denotation for every word, and no one word that consistently denotes a given object. In fact, the naming of things is highly local, and every community uses the limit stock of words available in denote the objects and the distinctions that matter locally to them. All language is local, therefore.

    Most of us belong to multiple linguistic locales, and don’t really think much about it as we switch naturally from one to another as we move from one community to another. But go to you spouses work Christmas party and listen to their co-workers talk shop, and you will realize that they are speaking a different language.

    Big metadata doesn’t work because all language is local. That may look like lack of management or discipline, when you see colleagues refusing to use the terminology you have mandated. But the real cause is that they are using the same words, but in a different language.

  • Richard Ordowich

    The problem with Big Metadata is the same as with metadata in the small. First what metadata is being considered? Business and technical? What attributes or characteristics of metadata are being considered?

    Those described in ISO 11179? Semantics, syntax, ontologies and taxonomies? What the data means varies by context including as the previous commenter stated and meanings are frequently subjective (unknowingly or purposely).

    Reverse engineering meaning and semantics from data using metadata is an interesting exercise that results in at least Fifty Shades of Grey. Reaching consensus or harmonizing data is possible but challenging.

    Humans can live with ambiguity (sometimes at their peril) but we expect systems to contain a single version of the truth that we can’t agree upon. A dictionary contains the descriptions of words based on common use. Data dictionaries require a single meaning. Humans interpret from ambiguity, something systems are yet incapable of doing.

    This is why the semantic web seems to have stalled. Understanding metadata can be helpful to expose the variances in meaning of the data but enforcing exacting standards through metadata is difficult. Everyone wants to retain their own interpretation.

  • I love these observations! They start to address one of the phenomena that I’m coming across in almost every “Big Metadata” discussion — which is that people think there’s a logcal leap from “small” to “Big” — and there isn’t! I’m almost wanting to add an extra V to volume/velocity/variability .. can I add Vagueness? This will pose some really interesting governance issues. Actually, it’s increasingly unrealistic to allow worthwhile systems not to deal with ambiguity (and how does Watson manage?). I’d argue that a combination of metadata and semantic technologies will be key — but that metadata science is immature, and semantic technology is no more than nascent.

  • Willem Fourie

    Perhaps we should forget about consensus and rather link related terms. We could then later decide which would be the primary term of the group.

You might also like...

Case Study: Using Data Quality and Data Management to Improve Patient Care

Read More →