
In 1970, Ted Codd introduced the relational data model, which proposed representing data as tuples, grouped into relations, to allow for declarative methods to specify data.
SQL was developed at IBM as a way to query relational databases. It is a declarative programming language, expressing what data is to be retrieved, as opposed to imperative programming languages like C, Java, or Python, which describe the flow of computation, i.e., how data is to be retrieved.
Over the decades, other data models have been introduced, some of which gained popularity in niches – key-value stores, document databases, full-text search databases, graph databases, and lately vector databases. SQL and the relational model are still the de-facto standard for database management systems.
SQL as a querying standard has stood the test of time, and there is little reason to believe that will change any time soon. In this blog post, we substantiate that position in the face of new challengers – vector databases and large language models.
Vector Databases
As language models and other deep neural networks learn to categorize text and image data, their learned transformations are used to generate embeddings, which are vectors that attempt to represent the model’s latent semantics. The distance between two vectors measures their relatedness, so embeddings are commonly used for similarity search and clustering.
With the skyrocketing popularity of large language models in late 2022 came vector database management systems, which introduced native support for vector operations, mainly similarity searches. Vector DBMSs achieved this by constructing indexes that enabled efficient approximate nearest neighbor (ANN) queries. Several popular databases have been released in this space, with AI workflow tools offering native embedding integrations with them.
Race to Market
It wasn’t long before existing DBMSs – relational and otherwise – introduced vector search extensions, most of them releasing or announcing support for vector search in 2023 and 2024. Even database vendors that already supported some less efficient forms of similarity search wasted no time implementing the latest ANN algorithms like IVF and HNSW.
This trend has similarities to the popularity of NoSQL databases in the 2000s, though it took several years for existing relation systems to add support for unstructured data at the time. In fact, as Andy Pavlo and Michael Stonebraker’s paper explains, this cycle has repeated for decades. Several non-relational data models have been proposed, and some have found a niche, but the relational model still dominates the market.
Longevity of the Relational Model
So how do existing RDBMSs stay on top of emerging trends? The persistence of the relational model through decades of innovation in hardware and data models can be attributed to several factors.
- Many RDBMSs take advantage of SQL’s extensible design to support plugin architectures, making it relatively straightforward to add new functionalities (e.g., new data types, indexing methods, or execution engines). Moreover, the SQL standard has grown to include newly introduced data types like JSON, and there is every indication that it will include a vector similarity type soon.
- The relational model specifies a logical data structure without a specific storage format. This has allowed RDBMSs to keep up with changing system architectures. They are now found everywhere – in columnar databases, cloud databases and data lakes.
- Many alternatives like NoSQL systems initially rejected SQL and RM but later incorporated SQL-like interfaces and ACID transactions, essentially converging towards traditional relational databases.
- Emerging databases tend to support specialized workloads. Managing multiple databases specialized for different workloads introduces operational complexity. Developers are more inclined to stick with or return to RDBMSs as they evolve to support new workloads like NoSQL or vector-based tasks.
RDBMSs have adopted vector databases within a year of their introduction – significantly less time than it took to support NoSQL paradigms.
NoSQL introduced entirely new data models (e.g., key-value, document, graph) that broke away from the rigid schema of relational systems. This required RDBMSs to rethink and extend their core principles to support schema-less design, denormalized structures, and non-relational APIs.
On the other hand, vector databases focus primarily on a new data type (embeddings) and a specific type of query (similarity search), and haven’t introduced any particular innovations in storage or indexing formats. It’s also easy to see that market forces behind AI are much stronger than those that drove the adoption of NoSQL databases, accelerating the adoption of new technologies.
Case Studies: Migrating from Specialized Vector Databases to RDBMSs
As relational database systems increasingly adopt native vector search capabilities, organizations are reconsidering the need for specialized vector databases. The cost, performance, and operational simplicity of using existing RDBMSs have motivated many to transition away from dedicated vector database systems. Below are some illustrative case studies that highlight the different motivations for this trend.
Performance Parity
Research by Jonathan Katz demonstrated how optimizations in traditional databases delivered an order of magnitude speedup for specific workloads, and improved query throughput and latency in existing solutions. Advances in indexing algorithms have made vector search in generalized databases competitive with specialized systems. This eliminates one of the primary advantages that vector databases once held.
Cost Efficiency
Specialized vector databases, while optimized for certain tasks, often come with higher infrastructure and licensing costs. RDBMSs with vector search extensions, enable organizations to achieve comparable performance at significantly lower costs by leveraging their existing database infrastructure. A detailed comparison revealed that traditional databases can achieve comparable performance to dedicated vector databases while reducing costs by up to 75%. Most popular relational DBMSs are also free and open source, while vector databases are managed solutions operating on expensive cloud licenses.
Operational Simplicity
Managing multiple databases for different workloads adds complexity to development, deployment, and maintenance workflows. Generalized systems allow teams to handle both relational data and vector search within the same platform, simplifying infrastructure and reducing the need for specialized expertise. Some case studies cite incompleteness, data synchronicity, and scalability as the main issues that motivated their migration from a specialized database to a traditional RDBMS with new vector features, and also find that RDBMSs with vector extensions presented a more balanced solution, offering competitive performance with the added advantage of SQL integration.
SQL and Large Language Models
As large language models (LLMs) reshape the technology landscape, some have questioned whether SQL will remain relevant. However, there are compelling reasons to believe that SQL won’t just survive the AI revolution – it may actually thrive in it.
First, LLMs still require an intermediate language to communicate with databases. SQL has proven itself the most stable and widely-adopted query language over decades, and is perfectly positioned to serve this role. Rather than replacing SQL, LLMs are more likely to reinforce its dominance by making it more accessible through natural language interfaces.
However, this accessibility comes with important caveats. SQL was deliberately designed to approximate English syntax, making it relatively intuitive to learn and use. For the vast majority of use cases, SQL queries involve basic filters and joins that are straightforward to write directly. While LLMs can certainly generate SQL from natural language descriptions, this process requires providing the complete database schema and carefully specifying the desired query. For all but the most complex queries, writing SQL directly is often more efficient than crafting the perfect prompt for an LLM.
Perhaps most importantly, even in scenarios where LLMs generate SQL queries, developers and data professionals still need SQL expertise to verify, debug, and test these queries. An LLM might generate syntactically correct SQL that doesn’t actually achieve the desired outcome or performs inefficiently. Without understanding SQL, it would be impossible to validate and optimize these generated queries effectively.
Conclusion
StackOverflow’s 2024 Developer Survey shows that SQL is still top three in highly desired programming languages, and that most of the top databases are SQL-based and use the relational model. An examination of 24,000 job listings by software company Finoit found that SQL was #2 after Python in top skills that employers list on job postings.
Further proof of its value is found in how the NoSQL script has flipped in recent years, with most non-relational database vendors releasing support for their own flavors of SQL.
Ultimately, organizations and individuals will prefer future-proof solutions. As RDBMSs continue to evolve, they are well-positioned to adapt to future data requirements, including emerging AI-driven workloads. SQL and the relational model are here to stay.