One of the newer data buzzwords is “data debt.” Actually, it is approximately 10 years old, and it became popular ever since agile people realized that postponing things creates not only technical debt, but certainly also data debt. Will we, in 2023, be better at not creating so much data debt, and will it be […]
It’s All About Relations!
The new ISO 39075 Graph Query Language Standard is to hit the data streets in late 2023 (?). Then what? If graph databases are standardized pretty soon, what will happen to SQL? They will very likely stay around for a long time. Not simply because legacy SQL has a tremendous inertia, but because relational database paradigms […]
The Data Engineer’s Roadmap
Data engineering is a fascinating and fulfilling career – you are at the helm of every business operation that requires data, and as long as users generate data, businesses will always need data engineers. In other words, job security is guaranteed. But, with such great power comes great responsibility. The journey to becoming a successful data engineer […]
A Primer to Optimizing Your Apache Cassandra Compaction Strategy
When setting up an Apache Cassandra table schema and anticipating how you’ll use the table, it’s a best practice to simultaneously formulate a thoughtful compaction strategy. While a Cassandra table’s compaction strategy can be adjusted after its creation, doing so invites costly cluster performance penalties because Cassandra will need to rewrite all of that table’s data. Taking […]
Say Hello to Graph Normal Form (GNF)
You thought you knew all normal forms? (And possibly also some abnormal …) Well, think again: There is also “graph normal form (GNF).” The diagram below is a fifth normal form graph concept model, which is just a few steps from GNF, so hang on: Where GNF comes from GNF is based on serious mathematics, […]
Data Modeling Techniques and Best Practices
Data models play an integral role in the development of effective data architecture for modern businesses. They are key to the conceptualization, planning, and building of an integrated data repository that drives advanced analytics and BI. In this blog post, we’ll provide you with an overview of the most popular data modeling techniques and best practices to […]
The Rise of the Semantic Layer
Cloud giants like Google and Snowflake, unicorns like dbt Labs, and a host of venture-backed startups are now talking about a critical new layer in the data and analytics stack. Some call it a “metrics layer,” or a “metrics hub” or “headless BI,” but most call it a “semantic layer.” I prefer to call it a “semantic layer” because it best describes a business-friendly interface […]
A Beginner’s Guide to Data Modeling and Analytics
As more and more companies start to use data-related applications to manage their huge assets of data, the concepts of data modeling and analytics are becoming increasingly important. While they typically rely on one each, they are two very distinct concepts. Companies use data analysis to clean, transform, and model their sets of data, whereas they […]
What Kinds of Data Languages Will We Need in the Future?
IBM had a pole position on the Database Management Systems (DBMS) market by developing “DL/I” in the 1960s as a means for defining and using hierarchical databases. Under the product names of DL/I and IMS (Information Management System) this dominated the database market for many years. Everybody, except for IBM followers, called the product “D-L-1,” […]
Knowledge Graph Standards in Ambient Computing
Ambient computing is a broad term that describes an environment of smart devices, data, AI decisions, and human activity that enables computer actions alongside everyday life, without the need for direct human commands or intervention. Ambient computing represents an unparalleled opportunity to enhance almost every sphere of society – from the professional to the personal. And in […]