IBM had a pole position on the Database Management Systems (DBMS) market by developing “DL/I” in the 1960s as a means for defining and using hierarchical databases. Under the product names of DL/I and IMS (Information Management System) this dominated the database market for many years. Everybody, except for IBM followers, called the product “D-L-1,” not “D-L-I.” (Yes, it was an interface, but who cares, when it was the first major commercial DBMS?) And we were many, who subscribed to the idea of having “data languages.” But even today, there is still no “Data Language/Two” – or 10, or 142, for that matter. Why?
The motivation of this post is a public survey – which you can find below – about the serious terminology issues that we, as data language professionals, are facing. Have your say! Read on and then go and take the survey.
GET UNLIMITED ACCESS TO 160+ ONLINE COURSES
Choose from a wide range of on-demand Data Management courses and comprehensive training programs with our premium subscription.
The Category Jungle Has Eaten “Data Language”
Maybe categorization is the issue. In the English version of the Wikipedia entry for DL/I, the letters “D-L-I” stand for “Data Language Interface.” (Don’t let the “Interface” fool you – the Italian version of the same article refers to Data Language One!) According to Wikipedia, the article belongs to the category “Data-Structured Programming Languages.” Data language interface is one of a kind, and the other two subcategories are “array programming languages” and “stack-oriented programming languages.”
Only one data language on this planet? Cannot be. Certainly, at least one other language comes to mind in this context, and that language is SQL. SQL, specified in the parts of ISO/IEC 9075, is a database language.
So, where DL/I was a data language, SQL is a “database language” (of the “data sublanguage” subcategory). It would have been too much to ask IBM to name SQL as “Data Language/Two” … But they did it for DB2, their strategic SQL offering …
Note that Data Language/One was not, repeat not, a programming language, but “… the language system used to access IBM’s IMS databases, and its data communication system.”
Searching for “data language” on Google gives you a company by that name as well as good old Data Language/One. Following that, Google munches a host of “data science programming languages” and shows you a lot of those. It also offers this explanation:
“What are data languages?
Database languages, also known as query languages or data query languages, are a classification of programming languages that developers use to define and access databases, which are collections of organized data that users can access electronically.”
So, someone in cyberspace has decided that “data language” is a synonym of “database language.”
However, searching for “database language” (in Wikipedia) redirects to “query language,” which shows a long list of mostly query languages, including some from the RDF world. And including some logic languages and even the LDAP language for directory services. Doing similar searches in Google also returns database products such as Clipper (remember that?) and dBase.
OK, OK, Database Language Is the Term, Isn’t It?
If that is so, what is a “database” about? Oracle UK has some useful definitions:
“A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). … The data can then be easily accessed, managed, modified, updated, controlled, and organized.”
And on SQL:
“SQL is a programming language used by nearly all relational databases to query, manipulate, and define data, and to provide access control. … Although SQL is still widely used today, new programming languages are beginning to appear.”
Correction: SQL Is a Programming Language
Wait a minute, SQL is a “programming language”? What does that imply?
Asking Wikipedia for a definition, this is what you get:
“A programming language is any set of rules that converts strings, or graphical program elements in the case of visual programming languages, to various kinds of machine code output. Programming languages are one kind of computer language, and are used in computer programming to implement algorithms.”
“There is no overarching classification scheme for programming languages. A given programming language does not usually have a single ancestor language. … The task is further complicated by the fact that languages can be classified along multiple axes. For example, Java is both an object-oriented language (because it encourages object-oriented organization) and a concurrent language. … programming languages divide into programming paradigms and a classification by intended domain of use, with general-purpose programming languages distinguished from domain-specific programming languages.”
Explanation: SQL Is a Domain-Specific Programming Language
So, without you or me doing anything, SQL changed category into “domain-specific languages” in the category of programming languages – the domain being “relational databases.” This contrasts to many people, whom I know to be capable of doing almost anything in any and all domains using SQL.
Wikipedia gives some examples of domain-specific programming languages:
ColdFusion Markup Language
Software engineering uses
Unreal Engine before version 4 and other games
Rules Engines for Policy Automation
Statistical modeling languages
Generate model and services to multiple programming Languages
SQL is not on the list, but it is commented in a subsequent text: “A computer language like SQL presents an interesting case: it can be deemed a domain-specific language because it is specific to a specific domain (in SQL’s case, accessing and managing relational databases), and is often called from another application, but SQL has more keywords and functions than many scripting languages and is often thought of as a language in its own right, perhaps because of the prevalence of database manipulation in programming and the amount of mastery required to be an expert in the language.”
Is the dog biting its tail here: SQL is domain-specific “… because of the prevalence of database manipulation in programming…”?
Data Languages Such as SQL Must Be Multi-Domain
Is SQL really a programming language that exists in order for developers to develop “relational” algorithms? Or is it a language that enables you to query, manipulate, and define data, and to provide access control to relational data? Seasoned SQL persons claim that “SQL is not an end user language.” That may well be so by intention. Facts are that many non-developers like case workers, investigators, business analysts, scientists, and other people with strong requirements for complex data access in general have learned to use SQL for (some of) their everyday tasks. In the SQL there are whole forests of “analytics” tools (including some of the Data Science label), whereas in the graph universe there are fewer tools, but some of them quite powerful also for investigators (law enforcement, for example). The underlying structure and semantics are “shining through” the interfaces of the tools, not least by carrying forward the “atomic” paradigms (like table, relationships, etc.).
In short, languages like SQL are not only for data definition and manipulation, but also very much for analytics (to be performed by tools and/or by users using SQL). In fact, the “algorithms” of “programming languages” and the “analytics” of data languages are both domains in their own right.
Aside: Another discussion about programming languages is that of computational completeness. In programming language theory, it is required that programming languages be computationally complete. SQL was not completely complete before the addition of recursive common table expressions (in “SQL 99”). But how many actually use the WITH … construct?
So, the computational completeness is a formality without having much payload in the real life. Which again means that calling SQL a programming language because it, SQL, is very good for programming might in reality be a misnomer. End of aside.
A Crucial Point: What Are Languages For?
Furthermore, Wikipedia adds another important point: “Programming languages differ from natural languages in that natural languages are only used for interaction between people, while programming languages also allow humans to communicate instructions to machines.”
I am fine with thinking that programming languages are for communicating instructions to machines. But that leaves a vast, open space for communicating data and information to human beings and for transporting data and information between different (autonomous) databases.
That is what I intuitively infer from “being a data language.”
Enter Data-Oriented Languages
Somebody in cyberspace must have heard me. There is a cat door and you can see it here.
However, graph-oriented data languages (like Cypher and GSQL) are not included there. But Gremlin is, and so are RDQL and SPARQL.
And I agree, the RDF world is part of data languages. Very much so.
But that opens up another Pandora’s box, like, for example, Gellish / Formalized English. And having that on board also opens up for Datalog, which is gaining increased popularity today in contemporary developer stacks. For example, here are some environments where Datalog is used today: Clojure, XTDB, Erlang, Haskell, Java (AbcDatalog), Lua, pyDatalog, Racket, Rust, Jena, TerminusDB.
Datalog is said to have influenced the recursivity of SQL-99, so we are hereby closing a loop of some importance. Datalog is a logic-based language used for deductive data analysis.
Data Transport and Metadata
And now that the cork has come off the bottle, we (I, at least) realize that we have to include data transport languages such as:
- And YAML 1.2
- Other markup languages?
As well as a gazillion recognized standards for metadata; pick your favorites yourself from the ISO data management stack here.
Why This Is Important
Currently, the database world is (again) being redefined. The ISO/IEC_JTC_1/SC_32/WG 3 are the custodians of SQL. And they are also architects of a new Graph Query Language standard by the name of GQL.
Expect GQL to become a multipurpose data language (if you ask me) that, as SQL can today, do almost everything across many different domains; include the major ones from SQL and include also the graph universe.
Data Language Concerns Questions
Therefore, I ask for your feedback in cleaning up the terminology of our future multipurpose data language. It is just 16 easy questions. Thank you for your assistance!