Data Modeling creates a model for storing and processing data that works in a predictable, consistent manner. It includes the visual presentation of data structures, while enforcing business rules and government policies. A data model focuses on the needed data and its organization, rather than the operations performed on the data. Data Modeling is done by professional data modelers, who work closely with business managers and staff to create functional models.
Data models are progressive and flexible, in that there is no finalized version for a business or application, and they can always be tweaked, adjusted, and improved upon. They can be created for a variety of projects and during different phases of projects. Data models can be treated as living documents, capable of changing in response to an evolving business world.
Pascal Desmarets is the CEO and founder of Hackolade. Hackolade provides a visual Data Modeling “software” package for NoSQL database schemas, including Neo4j, Cassandra, Elasticsearch, MongoDB, JSON, Avro, Hive, HBase, MarkLogic, DynamoDB, Couchbase, Cosmos DB, Amazon Neptune, Google Big Query, and also Swagger and OpenAPI documentation. In a recent interview with DATAVERSITY®, Desmarets commented about how Data Modeling seemed to fall out of fashion for awhile with the growth of non-relational databases and Agile practices:
“I think that the pendulum is swinging back, from this last decade of misinterpreting what ‘agile’ really means. Companies are feeling the limitations of code-first and schema-on-read. They realize that it’s still necessary to practice Data Modeling, but it requires with a new approach, a new methodology, and new tools that are adapted to this new way of doing Agile Data Modeling. Hackolade’s niche is to do Data Modeling for everything ‘other’ than the traditional relational databases.”
The Agile Data Modeler
Traditional Data Architectures have been disrupted by recent Data Management trends (data lakes, NoSQL databases), he said. As Data Management concepts have evolved, conventional data modelers have sometimes struggled to keep up. Analytics and Business Intelligence have challenged “proven, standardized” Data Modeling techniques, forcing data modelers to constantly acquire new skills and techniques.
Agile Data Modeling uses a minimalist philosophy, commented Desmarets, requiring a minimally sufficient design for the foundation of the desired model. Aspects of the physical and logical models are completed and timed to support the development of application features. “Agile data modelers try to avoid creating details of the model that aren’t immediately needed,” he commented.
Combined with a good sense of Data Modeling discipline, this philosophy builds the right data model for specific situations. This model is flexible enough to support future needs, as those become reality.
The book Agile Modeling by Scott Ambler covers agile modeling practices and principles in depth.
New kinds of data research require updated modeling skills. A data modeler’s toolbox must be able to address unstructured data, relational data, master data, and dimensional data. Currently, dimensional data is considered an essential component in data warehouse and Business Intelligence activities.
Data that has been organized dimensionally provides a more adaptable and effective path for business information and analytics than normalized, traditional data structures. “Data Modeling is an important process, and data modelers must continue to adjust, staying ahead of the evolving reality of data processing,” he said.
A good data modeler works well under pressure, possessing the ability to focus while efficiently completing projects. They should be team players, but also able to work independently. Data modelers should also be able to juggle multiple projects simultaneously.
According to Desmarets:
“We need to look at the best allocation of resources. Developers can focus on development, and they’re really good at that, while data modelers are really outstanding at translating business problems into concepts that can be programmed. But they need to do that without becoming a bottleneck. We think that the traditional approach with the three steps of conceptual modeling, followed by logical modeling, followed by physical modeling, is just too heavy.”
Professionals might say, “we need to do an enterprise data model, so we’ll come back in six months with the answer.” The developers are saying, “I’ve got this two-week sprint, and I need to get my work done.” So according to Desmarets, there needs to be a change in the dynamics, and for that, “we need a new methodology and we need nimble and agile tools.”
The methodology in a blog article he published on DATAVERSITY is to use Domain-Driven Design instead of the traditional, conceptual modeling:
“Some people might say that it’s just the same as conceptual modeling, only different. I personally believe that Domain-Driven Design is a real revolution, because it’s completely in sync with agile development and NoSQL databases.”
Domain-Driven Design (DDD) is a method for developing software designed to reflect and implement core business concepts. It places the focus of design on the problem domain, he said.
DDD has strategic value, in that it maps and translates business domain concepts into software “artifacts.” (The word artifact comes from the Latin arte factum, meaning “skill” and “to make.”) DDD is about organizing code artifacts with business problems and using the same language.
Domain-Driven Design is neither a technology nor a methodology. It provides a system of terms and practices for reaching “design decisions” about software projects with complicated domains. With DDD, everyone uses the same language and terminology. As a result, everyone communicates better, and the work is done more efficiently. Solutions for the model reflect the business operations, instead of showing how the software operates.
“The premise of Domain-Driven Design is that complexity in enterprise systems is such that you end up, after a while, with what they call “a big ball of mud.” Applications evolve over time to support the needs of the enterprise, but without care, vision and rigor, change become risky and difficult to complete. Domain-Driven Design is a way to break a down complex problem into smaller ones that are much simpler to deal with. If you attack one of these smaller pieces one at a time, and you do the modeling according to the principles of DDD, then you can get in sync with the way agile developers will do their scrum, two-week sprints.”
Embracing New Technologies
Data Modeling has become even more important for NoSQL databases and Big Data file formats, which lack the constraints of “normalized” SQL systems. Semi-structured Big Data has created challenges for data modelers, both in terms of regulations, Data Governance (GDPR, etc.), and leveraging the information that has been accumulated.
Data Modeling can also help organizations migrate from RDBMS to a NoSQL system. The benefits of Data Modeling within NoSQL and relational databases come with improved Data Quality, privacy identifiable information (GDPR), and improved Business Intelligence. Desmarets closed by commenting that:
“With NoSQL databases, where your schema can easily evolve, Domain-Driven Design coupled with agile data modeling leads to a coherent and effective approach. Data modelers can demonstrate their value by embracing new methodologies like agile and new technologies like NoSQL. We’re providing a tool to facilitate design and have a dialogue around a picture of hierarchical structures in a way that is really user-friendly for people without the technical background of developers.”
Now data modelers can design their schema ahead of time, or as the application evolves and they go from sprint to sprint, or as new features are introduced to the database, or to the applications, they can now think through all of the impacts on the schema and how they should migrate data. Or if they have to migrate the data, they can have a dialogue about that too. Such innovations are now possible with Hackolade and DDD, to be applied to all data structures being exchanged in an enterprise – not just traditional relational databases.
Image used under license from Shutterstock.com