Generally Accepted Data Modeling Principles

*Learn more about author Thomas Frisendal.*

What can data modelers learn from accountants?

Accounting is a solidly established practice that the world cannot live without. One of the established guidelines for accountants is called GAAP (Generally Accepted Accounting Principles in the US), and there are similar international setups. You might guess these standards are about rules, but actually, accounting is much more than that. It contains some very well-thought-out best practices and some bright concepts about keeping records. In fact, there is a lot to learn (never thought that I should say that). 😊

Let’s start with one of the best-kept secrets of the trade.

Double-Entry Bookkeeping

Management of critical resources is business as usual for accountants and has been so for around 2000 years. Quoting Wikipedia: “In 70 AD, Pliny the Elder described the structure of the “Tabulae Rationum” as “on one page all the disbursements are entered, on the other page all the receipts; both pages constitute a whole for each operation of every man.” This is known today as double-entry bookkeeping, and it is still at the core of accounting. Here you see a posting of the purchase of a machine on January 31:

All transactions are booked on two accounts. No corrections are allowed. If you made an error, you would have to make a new double-entry transaction with the correction amount (difference) and book that. It is basically an error detection and prevention best practice.

Supplemented with the journaling best practice, it is also quite powerful and can really explain what has happened, including error corrections, etc.

You might very well, as I did recently, get the idea that a ledger with a journal is a specialized database designed to keep organizations alive and kicking, at least from a financial point of view.

Data Model

What, then, does the data model involved look like? Here is a simplified concept map:

This simple conceptual data model, coupled with simple functionality, a.k.a. accounting programs, is the machine room of financial governance. So, at this level, what you see is sort of a three-dimensional data model with credit account, debit account, and date/time as the dimensions and having two fact tables: postings and journal entries. The accounting functionality is based on the semantics just described above.

But there is more.

The Chart of Accounts

Here is a textbook chart of accounts:

The Chart of Accounts (example above) seems to be just a boring hierarchical account structure. And that it is. But you can also look upon it as a semantic model of a domain covering financial governance. Accounts live in one of the five contexts:

1. Assets: values that the legal entity owns
2. Liabilities: debt that the legal entity owes to someone
3. Equity: keeping track of the owner(s)
4. Revenue: keeping track of the income
5. Expenses: keeping track of the operating expenses

This can be thought of as just another dimension, but in reality, there is more going on: Each and every transaction is posted on two accounts, right? Each account belongs to one of the five “subdomains” above. Just a few examples:

A sales transaction involves a revenue account (e.g., an unknown online customer or Thomas Frisendal over in Denmark) and an asset account (PayPal account or asset account of receivables)
- A paycheck involves an expense account (payroll of the employee) and an asset account from where the money is withdrawn (e.g., a bank account), or the payment could be some days in the future, in which case the other account would be a liability (account of payables)
- And so on and so forth

This is called maintaining the Accounting Equation (Assets=Liabilities + Owner’s Equity), and it is the foremost reason for the double-entry accounting principles.

The fact that each transaction typically spans two “subdomains” means that the account’s domain really is a mesh of interconnected things:

The connecting arrows represent well-defined semantic relationships, defining distinct business processes. In reality, there is not just a single Chart of Accounts but a set of them working together to solve both the “mechanics” of posting transactions and keeping track in general. Similarly, there are also accounts that support management’s needs for keeping an eye on things in need of supervision. This really is a many-faceted and complex network of postings, transactions, and accounts. Fortunately, the semantics of the models and of the best practices are well understood and mostly built into the software that people use on a daily basis.

There are some clear parallels between double-entry accounting and semantic networks: <debit>-<transaction type>-<credit>, having accounts filling in the roles of subjects and objects.

The Business Perspectives

The basic structure just described supports many aspects of the financial management of organizations and businesses. Let us cast a brief view of the benefits of organizing things these ways.

In order to be certain that I understand current trends in the domain, I bought the book “The Finance Book” by Stuart Warner and Si Hussain (Pearson, 2017). The following is not a review or a writeup of the book. What I will highlight are my observations as I read it. If you need to improve your understanding of financial management and accounting, this is the book you could well begin with.

The organizational structure of a finance department is pretty standard across many organizations and involves accounting, treasury, and management accounting. Their work is performed using basically the same kinds of software that contain concepts like:

Ledgers (including a nominal ledger)
- Balances
- Journals
- Ordinary as well as statutory accounts
- Built-in controls including (among many) reconciliations, audit trails, and (internal and external) auditors
- Governance and whistleblowing best practices

There are national and international standards for accounting and reporting in place. Here are the two most important ones:

Generally Accepted Accounting Principles, GAAP (US)
- International Accounting/Financial Reporting Standards, IFRS (related to IAS, International Accounting Standards)

The standardization could be improved towards truly internationally comparable standards. There are regulations and reporting requirements all over the globe, and, except for tax havens, countries are pretty open about financial information of registered companies and entities.

Management Accounting “Packs” are used in all kinds of businesses for reporting and typically contain things like:

Key performance indicators
- Action plans
- Period and cumulative positions
- Progress reports
- Reconciliation
- Budget deviations

In short, we are looking at mature practices for keeping track of what many consider the most important thing: the financial situation of shareholders’ investments. There are some “shipwrecks,” but the vast majority of organizations and companies are very good at guarding the finances. And the business people and owners do not question these practices; they actually take them for granted and appreciate their benefits.

People should learn from this.

Which Other Business-Critical Functions Are Doing Equally Well?

Looking across larger organizations and businesses: If you ask me, the information science practitioners are doing fine. People are keeping track of information in many different government contexts (including law enforcement, etc.) as well as business contexts within pharma, healthcare, manufacturing, aerospace, etc. They have plenty of regulated and/or best practices, follow strict procedures/best practices, and are supported by many technologies and defined semantics such as:

Ontologies and taxonomies (many international standards — think Financial Industry Business Ontology as an example), mostly built upon the W3C standards, RDF and OWL
- Tight regulations — think the US Food and Drug Administration, for example
- Record keeping ISO standards (ISO 15489 Records management et al.)
- Knowledge graphs
- Specialized software built around semantics in general and the semantics of information science specifically

The ISO committee that maintains these standards offers this clarification:

“Records” is an English word that is difficult to translate to other languages. Even in English, it is used in different contexts with different meanings. An easy explanation:

1. Record = information
2. Not all information is a record > information created, received, and maintained as evidence and as an asset by an organization or person
3. When does an organization create records? > In pursuit of legal obligations or in the transaction of business

I am certainly not an information scientist, but I understand the semantics, and I have been working with quite a few. I have not seen “double-entry records posting” (could well exist, though), but there are some very convincing journals, and users have access to them, not least for auditing and compliance purposes.

So far, so good. All is well that ends well. But…

What About Data Management?

Yes, what indeed? Accounting manages the important issue of having sufficient finances. And information science handles the important issue of having correct and sufficient information, for instance, the results of phase 3 clinical vaccine trials, just to mention one thing. And data is an important container of information and, consequently, also of records.

In DBMSs, journals exist mostly for roll-back and roll-forward. Why did we not think about the business relevance of journal entries? They could well be expanded and made accessible (and persisted).

Data is also an important and business-critical issue — on a level with finances and records, right?

You might argue that some effects of Data Management shine through in both financial and regulatory reporting, for example. And that is true. However, such accomplishments are not based on anything specifically built into Data Management tools (multidimensional and graph-based reporting tools being the closest contenders). Nothing like debit/credit/journal semantics is found in SQL-based DBMSs or in NoSQL products either, in general.

Wait a minute; I hear your say: Isn’t finance a special application domain that is not on the DBMS level? Maybe so, but finance and accounting are indeed very general, and the way accounting packages bundle the posting and journaling of transactions keeping the accounting equation in balance can be seen as just a thin layer on top of the DBMS.

Could we imagine other kinds of — very general — functionality that could be handled in similar ways? If we put accounting aside, what other general functional requirements are in dire need of being handled better than in today’s DBMSs? (Be they SQL or graph or other, for that matter). Keeping in mind that such functionality should address real business requirements having considerable impact.

I can think of four, off the top of my head:

Temporality
- Records management in general
- Query transformation semantics
- Semantic cognizance

Let me explain.

Temporality

This beast is still annoying to many IT users. The simple need is to be able to recall the contents of a database, either as it is or as it was at some point in time in the past. And it is completely agnostic of any business domains.

This is handled to a certain extent in SQL 2015 versioning clauses (system or user-based versioning), but the problem is more general than “bitemporal.” See “SQL Technical Report — Part 2: SQL Support for Time-Related Information” linked here (SQL:2015 standard, [TR 19075-02_Time_2015]) for more information.

Here is an example of an ”as-of” query based on a Microsoft SQL Server Systems Versioned Temporal Table:

*(Downloaded from the MSDN documentation on 10-14-2020)*

Until 2015, consultants stood on each other’s shoulders building data vaults and anchor models and slowly changing dimensions and what have you. And many of them still do.

I have dealt with these issues in a number of posts here on DATAVERSITY’s site:

The bottom line is we can come a long way by adding an “upper-DBMS level” of functionality and associated syntax to deal with:

Setting up timelines
- Expanding queries that access timelines to deal with as-is/as-of style of logic in transparent manners

Much like “accounting DBMSs” (Microsoft Dynamics and what have you) deal with double-entry bookkeeping and journals — keeping track of time-related changes is basically journaling. And my proposals basically extend the current offerings of SQL.

Records Management in General

I mentioned this earlier as something that comes out of information science. But it is easily generalized. The ISO 15489-1: 2001 standard defines records management as “[the] field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use, and disposition of records, including the processes for capturing and maintaining evidence of and information about business activities and transactions in the form of records.” (Seen on Wikipedia).

Clearly, there is a huge overlap with temporality, as applied above. Actually, I believe that temporal DBMS thinking can influence Records Management. Likewise, Records Management can and should influence DBMS thinking. Using double-entry bookkeeping and journaling as analogies, the possibilities are unfolding nicely.

I would be deeply interested in helping incubate such ideas into actual software solutions.

Query Transformation Semantics

In my Learning from Complex Data Modeling Practices post linked above, I also mentioned the efforts of the data vault modelers to enable “insert only” practices in SQL databases. This is first and foremost driven by performance concerns, but it also rimes well with temporality and records management. I think there is potential here for generalizing it all into DBMS functionality that looks like SQL as-is and/or as-of, also in SQL, but with the cloak and dagger insert-only thing going on in the background. It needs some consideration but can probably be done in elegant ways.

Semantic Cognizance

In the temporal blog posts mentioned above, I also claim that going to an atomic level, basically the 6^th normal form, is necessary to understand the full semantics of the data model. And that is true for all of these “extended functionality” proposals. But, interestingly enough, going to that level basically aligns you with the RDF databases’ triples (subject-predicate-object). The data models in both camps are pretty close, and this level is where you find many RDF to property graph mappers operating. And property graphs map quite intuitively to normalized databases.

The most promising opportunity is embedding inference into the DBMSs, similar in many respects to what has been available to RDF/OWL users for many years. Inferencing involves both backward and forward reasoning, and you can think about it like this classic example starting out with two simple triples:

All persons are mortal.
Socrates is a person.

Then we can infer that (ergo) Socrates is mortal.

And that Socrates is not mortal is incorrect.

Reasoning capabilities like this have been available in RDF for almost 20 years. It was not the panacea that some expected it to be, but it has some excellent use cases.

Another benefit of improved semantic functionality would be that SQL and SPARQL came closer to each other — easing interoperability across platforms.

Cultivating New “Equations” and Creating Opportunities

Actually, what I am talking about is just a little added functionality and associated semantics. But it is amazing what it has done for accounting for over 2000 years. To recap — architecturally, it boils down to:

Double-entry as a general principle
- Semantics as part of the DBMS declarations
- No deletes or modifies but a lot of temporality
- Journals

It is not just about making, for example, a more intelligent SQL or GQL (Graph Query Language in the works), but it is certainly also an exercise for raising the bar for database capabilities and benefits. A quest for raising the levels of quality, consistency, accessibility, and, in general, trustworthiness. Seems to me that in the RDF-space companies like Fluree and TerminusDB are thinking along these lines, but I have not seen SQL or graph DBMS’s going in these directions. Hopefully, the paradigms can all be explained as elegantly as the first mover, The Accounting Equation. Here are the five contenders:

It is about time! Send your comments and suggestions to me at info@graphdatamodeling.com, and I will curate them into a follow-up blog post sometime in 2021.

Data Topics