You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

Big Data Changes the Metadata Point of View

By   /  November 18, 2013  /  2 Comments

by Ian Rowlands

I was talking to some very smart people about a knotty metadata problem yesterday. It eventually got to the dinosaur (me) explaining to the bright young things (everyone else) about how old-fashioned mainframe programs worked – and as I talked about the way COBOL handled datasets, records and fields it really brought at issue sharply into perspective.

Most of the programming languages I cut my teeth on were concerned about mapping the storage of the system they were running on. And data access and transfer assumed an understanding of how data was laid out on external media. But then along came object-oriented languages and everything changed! The way we thought shifted from being about how to write the logic, to being about describing the nature and behavior of the real-world objects we want to work with. (OK, more strictly, about the simplified representations we wanted to manipulate). For us dinosaurs, the impact was earth-shattering – you could write books about it, in fact people have.

So what’s the Big Data and metadata connection?  One connection is that there is a similar mind shift.

For a long time, the business of has focused on structuring data in ways the support known business processes for a given organizational structure with an attempt to generalize to anticipate reasonably expected types of change to the business. There’s a presumption that – by and large – the questions to be asked of the data are reasonably understood, and that the data can be logically structured to best support the answering of those questions. Typically though, hidden in the design are also some compromises based on an understanding that there are performance constraints in a given physical environment. Both the assumption that most questions can be reasonably anticipated and the constraints imposed by performance trade-offs have an ugly side effect when the business has unanticipated questions. The model may need to be changed, and the performance imperiled.

Big Data changes the point of view. The Big Data resource (some call it the Data Lake) consolidates many different information sources without pre-supposing the questions that might be asked and allows the business – or its Data Scientist proxies – to explore the data and discover valuable insights.

That’s only one way that Big Data is going to change the perspective – another is that (perhaps) we will finally stop constraining the way we store information to support hardware-driven performance issues. I say “perhaps” because I’ve heard that story before!

Big Data might change one other critical viewpoint. Your data is certain, sure, accurate isn’t it? If it isn’t you’ve got a Data Quality process to fix it haven’t you? You’re going to have let go of perfection and precision – Big Data is not certain!

So what does all this mean for metadata? Apart from the fact that there isn’t nearly as much native metadata as you might hope. In fact, without a pre-defined model, a lot of the metadata is in the analytics … whether they are coded or in specialized tools. A lot of other metadata is going to have to be applied externally. Big Data means that a lot of the metadata we have long cherished might go away … and that new types of metadata are going to need to be managed. The metamodel is going to change massively. And there are going to be disconnects between the metadata we use now, and “Big” metadata.

Things are looking a whole lot different from the “Big” point of view …

About the author

Ian Rowlands is ASG’s Product Marketing Manager (Data Intelligence). He heads product marketing for Metadata Management and is also tasked with providing content across ASG’s entire portfolio. Ian has also served as Vice President of ASG’s metadata product management and development teams. Before ASG, Ian served as Director of Indirect Channels for Viasoft, a leading Enterprise Application Management vendor that was later acquired by ASG and managed relationships with distributor partners outside North America. He has worked extensively in metadata management and IT systems and financial management, and presented at conferences world-wide, including DAMA and CMG.

  • A question that’s very interesting to me is – how will we leverage the relatively unstructured “data lake” (along with its toolset for analytics, investigation, visualization, etc.) to generate content with enough structure to drive the average website or application?

    It seems to me (and this isn’t a novel thought) that there’s a significant disconnect between the processes / systems / structures that are best suited to identifying and testing for value in the data lake versus those for everyday consumption systems, website, terminals, etc.

    This later set of systems will – I think – still benefit from excellent structure (be it tagging, metadata, or other) that allows for interoperability across all the different versions, manufacturer’s, and systems.

    So I imagine a process that takes the strengths of the big data / data lake approach to discovery and value creation and then delivers it to a consumption model that likely still leverages metadata. The power (and the challenge) here is to find the real value and establish processes that can quickly and easily get that valuable information into formats / indexes that can then be leveraged downstream.

    • Ian

      One of the interesting things about Big Data is to understand what it does change ,and what it doesn’t. Information used for business decision making still needs governance. Governance will still depend on metadata … and I think your sense that process will be critical is spot-on.

You might also like...

Augmented Analytics Use Cases

Read More →