I deemed last year The Year of Data. Big Data, Data Analytics, Open Data, Data Breaches, etc. were topics of discussion everywhere, even outside the IT world. Heck, data management even made it into the US State of the Union Address. I think all this added focus on data has given us a ripe opportunity to help our business organizations to leverage data modeling and other data management approaches even more than ever. It has also led to my inbox and Twitter stream to be flooded with questions about data modeling. I see this uptick in inquiries as a good indicator of job stability for data professionals. At least I hope so. If anything, I'm hoping it means that the pendulum has swung a bit back to center from "software is everything" mentalities. With all these questions about data model coming in, I thought I'd answer the most common ones here so that I can share them more widely.
Why is having a data model so important?
Ah, the big question. My reasons:
- Technologies used to move and persist data come in many forms, over time and at the same time. I support the creation of XML Documents, multiple DBMSs, multiple versions of the same DBMS. By having a single logical data model of data requirements, I can separate the rules and definitions of the requirements from their implementation.
- The same data exists in many platforms, in many locations. I need to be able to map those sources to targets across platforms and systems. Doing this mapping, or data lineage, with a data model is much easier. It also helps me understand the implications of making a change in one system on another system. I can't do that by just looking at code.
- Writing stuff down is often a good way to impress people that have to provide requirements, compliance issues, security requirements, etc. They like not having to answer the same question over and over for different people and roles. I can't tell you how many times a business person has asked me if IT ever writes anything down. Imagine your frustration if you had someone working on your house and ever tradesperson started their day with "Tell me about what sort of house you want? How many people will live here? Do you want to be able go get to the second floor? Do you need a bathroom? How about a shower?" That's how many business professionals see us in IT.
- A data model is a great way of capturing rules, constraints, definitions in a method that is technology independent. I can capture those things once and reference them in many places.
- We can measure databases against our data models to assess fit. This can be done for application packages and custom development.
- Enterprise data is complex. I work with a data model that has 32,000 objects (tables, columns, datatypes, constraints, etc.) in it. There is no way I could professionally manage change by just trying to remember all this information. Nor could anyone else.
- Modeling helps you ask the right questions before a bunch of time is spent coding, creating screens, reports, etc. This reduces costs and the number of bug fixes required.
- Like all models, the data model is a communication tool and is good for tying requirements directly to designs and implementations. This is especially true when I generate data prototypes based on the model.
- Data governance can't easily be done via reverse-engineered pictures of databases.
- Once you've worked with a great data model, you can’t go back.
Is it worth spending money on consultants to teach us the tool and help us build a model, or should we expect a newly hired data architect to be able to do that for us?
It depends…am I the consultant? If you hire the right kind of architect, he or she should be able to do that sort of training if you budget time for them to do that. Often when I’m hired as the sole architect, I’m swamped with real work and no one has time to be trained nor is there time allocated for me to train anyone. It also depends on the consultants. If they specialize, really, in data modeling, it may be worth it. If they have casual experience, then maybe not. I want to warn you that hiring a data architect that really is one is difficult. Most interviewees will be DBAs or developers who have once seen a data model or always wanted to get into this line of work. I typically have to review 50 resumes to find a candidate that has real enterprise level experience. If you are going to have only one person filling the data architect role, they can't be working at the apprentice level unless they have a mentor. Most good data architects actually aren't good part time DBAs, either. You may be better off trying to find a combination BA/DA. I wrote about this data architect hiring dilemma previously.
What can we do with a tool like [insert real data modeling tool here] that we can't do other less expensive tools?
What you can do with real data modeling tools is…modeling. The iterative process of doing logical models, having multiple physical models based on them, making changes to them (either because you are in development or because you have a new requirement) is how projects work in the real world. Many of the less expensive tools allow you to reverse engineer a database and forward engineering a full database. But here in the real world of development we make changes to designs and models all day long. We need to estimate the cost of change, measure the impact of that change and make changes. I need to be able to generate changes to databases without a lot of hand scripting and I can't drop the database and recreate it on a regular basis. I have to do that while supporting multiple versions of the same DBMS or multiple types of DBMSs. The less expensive ones either don’t’ do any of that, or they don’t easily support the iterative process of real life modeling and design. I also need to be able to produce reports, images and interactive versions of the data models. I also need to be able to share the models in formats that can be easily consumed by dozens of other modeling and development tools. Most of the lesser tools don't have these shiny features. Many of the real data modeling tools have features geared toward collaborating with other team members. These features make sharing, updating, commenting on and getting hands-on with the data models easier and faster. It really comes down to finding a tool that helps you reduce costs of development, get that development done faster and getting better quality designs. The value proposition for that is enormous compared to costs savings of having a tool that only makes nice pictures of one type of database.
What is your question?
Do you have a question about data modeling? Leave it in the comments or contact me directly @datachick or via the Contact Us menu item above and I'll blog a response here.