by Ian Rowlands
I was talking to some very smart people about a knotty metadata problem yesterday. It eventually got to the dinosaur (me) explaining to the bright young things (everyone else) about how old-fashioned mainframe programs worked – and as I talked about the way COBOL handled datasets, records and fields it really brought at issue sharply into perspective.
Most of the programming languages I cut my teeth on were concerned about mapping the storage of the system they were running on. And data access and transfer assumed an understanding of how data was laid out on external media. But then along came object-oriented languages and everything changed! The way we thought shifted from being about how to write the logic, to being about describing the nature and behavior of the real-world objects we want to work with. (OK, more strictly, about the simplified representations we wanted to manipulate). For us dinosaurs, the impact was earth-shattering – you could write books about it, in fact people have.
So what’s the Big Data and metadata connection? One connection is that there is a similar mind shift.
For a long time, the business of has focused on structuring data in ways the support known business processes for a given organizational structure with an attempt to generalize to anticipate reasonably expected types of change to the business. There's a presumption that – by and large – the questions to be asked of the data are reasonably understood, and that the data can be logically structured to best support the answering of those questions. Typically though, hidden in the design are also some compromises based on an understanding that there are performance constraints in a given physical environment. Both the assumption that most questions can be reasonably anticipated and the constraints imposed by performance trade-offs have an ugly side effect when the business has unanticipated questions. The model may need to be changed, and the performance imperiled.
Big Data changes the point of view. The Big Data resource (some call it the Data Lake) consolidates many different information sources without pre-supposing the questions that might be asked and allows the business – or its Data Scientist proxies – to explore the data and discover valuable insights.
That’s only one way that Big Data is going to change the perspective – another is that (perhaps) we will finally stop constraining the way we store information to support hardware-driven performance issues. I say “perhaps” because I’ve heard that story before!
Big Data might change one other critical viewpoint. Your data is certain, sure, accurate isn’t it? If it isn’t you’ve got a Data Quality process to fix it haven’t you? You’re going to have let go of perfection and precision – Big Data is not certain!
So what does all this mean for metadata? Apart from the fact that there isn’t nearly as much native metadata as you might hope. In fact, without a pre-defined model, a lot of the metadata is in the analytics … whether they are coded or in specialized tools. A lot of other metadata is going to have to be applied externally. Big Data means that a lot of the metadata we have long cherished might go away … and that new types of metadata are going to need to be managed. The metamodel is going to change massively. And there are going to be disconnects between the metadata we use now, and “Big” metadata.
Things are looking a whole lot different from the “Big” point of view …