Arguably the most controversial buzzword to come along in the data professional community in years, NoSQL means different things to different people. Over time, the “No” in NoSQL has effectively changed to “Not only,” which better reflects the multifaceted world surrounding this collection of predominately non-relational database technologies.
NoSQL continues to grow in relevance in 2012, in many cases serving as the data layer for Big Data applications hosted by some “as a Service” offering based in the Cloud. That previous sentence sums up the four biggest trends affecting data professionals today. But they need not be worried about NoSQL making years of relational database experience suddenly obsolete. Many of same analytical and technical skills still hold importance.
It remains important to realize that the changing needs of business – namely high scalability, increased velocity, improved analytics, and social interaction – are the primary drivers for the move towards NoSQL technologies. To get a better feel for how NoSQL helps to achieve improvement in those areas, it helps to better define the term.
So what is NoSQL?
As stated earlier, NoSQL now stands for “Not Only SQL,” a kindler, gentler definition compared to its previous more anti-relational meaning. The best way to look at the term is as a collection of different mostly non-relational technologies. In fact, “brand” marketing might be the one thing keeping the NoSQL community together as a movement.
The relational database model remains useful for many of the same traditional applications it served well over the past three decades. The popularity of the LAMP solution stack ensures that MySQL use continues to grow today. Obviously superior to flat-file or hierarchical models for most applications, the relational model is generally easy to understand and analyze.
Where relational model falls apart is with applications requiring massive amounts of data and scale, as well as those with the need for insanely fast querying or analytical capability – finding the proverbial needle in a haystack. Enter NoSQL. Driven in part by the distributed nature of the Web, minimally structured databases with the ability to scale large amounts of data stored across server farms began to appear around the turn of the millennium.
Four different database technologies are the biggest players in the world of NoSQL. Key-Value, Graph, Document, and Big Table data stores all feature the advantages of non-relational databases, allowing the scalability and fast analytics needed by today’s applications. It is reasonable to expect that over time, these four technologies might grow in popularity to the point that they individually outgrow the NoSQL “umbrella,” so to speak.
This article touches quickly on each of the four technologies, with more detailed analyses of each to appear here at DATAVERSITY over the next few weeks.
A Micro History of the NoSQL Movement
The term “NoSQL” first appeared in 1998, used to describe a relational database developed by Carlo Strozzi that provided no form of the SQL language for querying. This initial usage remains somewhat unrelated to the NoSQL movement as its known today.
Needing a name for 2009 conference covering a collection of open-source distributed databases an employee for the cloud hosting company, Rackspace, reused the NoSQL term for the event. This essentially means the term that came to describe the non-relational database movement had its origins in marketing.
Essential Differences between NoSQL and SQL
It’s hard to pigeonhole NoSQL as one static entity, but there are some notable differences between the collective of NoSQL data stores and their relational brethren. Arguably the largest difference is most NoSQL databases’ lack of adherence to the time-honored SQL principles of ACID (atomicity, consistency, isolation, durability).
In the highly social applications typically using NoSQL on the back-end, consistency from a database standpoint is hard to achieve while still providing a responsive user experience. The distributed computing principle of “eventual consistency” describes how many NoSQL databases handle this issue in a parallel environment.
Another obvious difference is the absence of a standard query language with most NoSQL databases. Well, it is called NoSQL after all! A group is currently working on UnQL which aspires to be the language of choice for the NoSQL community.
UnQL focuses on language constructs that allow the querying of collections and documents as well as any data marked up by JSON (JavaScript Object Notation.) The language provides no DDL (data description language) functionality, i.e. things like the CREATE and DROP TABLE statements used in SQL.
Key-Value Databases
Key-Value data stores might be the most ubiquitous technology under the NoSQL banner. Essentially a schema-less construct containing a key along with a piece of associated data or object, the Key-Value pattern is commonly used in programming as well.
Most Key-Value databases follow the eventually consistent principle. Apache Cassandra is arguably the most famous Key-Value database, which makes sense considering it was developed by social networking giant, Facebook.
Various companies are offering Data as a Service hosting options using Key-Value data stores on the back-end. Amazon’s DynamoDB and Cloudant are two of more popular vendors in this space, with Cloudant especially making some noise in the gaming world.
In-memory cached databases partner nicely with the Key-Value storage pattern. Oracle’s Coherence in-memory data store allows the relational database giant to spread its tentacles into the NoSQL community.
Graph Databases
Graph databases use a matrix view of the underlying data, focused on the relationships between two entities. Social networking is one obvious application of Graph databases, in addition to the proverbial “find a needle in the haystack” level of deep analytical reporting.
Objectivity’s InfiniteGraph product has garnered praise in the industry for its Graph database performance benchmarks. Even Big Blue’s venerable DB2 recently added Graph database support in version 10.
Document Databases
Document stores encompass a wide array of formats and binary encodings. Standard markup formats like XML and JSON combine with proprietary binaries like Microsoft Word and Adobe’s PDF.
Key lookup, tagging, and the use of Metadata remain vital for the successful querying of Document databases. As previously mentioned, the UnQL query language was primarily developed for the querying of documents and data objects marked up in JSON.
Apache Couch DB is a popular document database that uses JSON for markup along with routines written in JavaScript for querying purposes. MongoDB, currently in production with Craigslist and others, is another example of JSON markup in a document data store.
Big Table Databases
Tabular databases are another term used to describe Big Table databases which, making sense given their name, are highly suitable for Big Data applications scalable across server farms containing thousands of CPUs.
BigTable, a Google-developed tabular database, uses a three-dimensional key structure containing row and column keys along with a timestamp. Google’s instance of MapReduce is used to create and modify data stored in BigTable. BigTable is widely used in many of Google’s other applications, including Google Earth, YouTube, and Gmail.
Apache HBase is a tabular database used in concert with Hadoop and runs on top of the Hadoop File System. HBase owes much of its technology to Google BigTable which makes sense considering Hadoop’s own genesis from MapReduce.
Other NoSQL Formats
Some database formats under the NoSQL umbrella fall outside of the “big four,” in many cases including older technologies that predate the current focus on web-based, distributed computing. Object databases have been around since the 1980s; they generally combine object-oriented programming techniques with these objects being stored in database, blurring the usual division between the data and application layers.
MultiValue databases allow the storing of multiple values for an attribute in a table, like a comma delimited list of email addresses for one person. PICK, developed in the 1960s, is considered to be the first MultiValue database. Less widely-used NoSQL dialects include Tuple Store databases as well as the RDF format.
NoSQL is not a Standard
Ultimately, “movement” is probably the best way to describe the multitude of database technologies currently under the NoSQL banner. Considering the current use of term is barely three years old, there has been little time for anything similar to an ANSI standard around NoSQL; standardization is probably not appropriate in this regard anyway.
Still, there is no denying the impact non-relational databases have made in the industry, paralleling the growth of social networking, Big Data, and distributed computing. Necessity is the mother of invention, and in the case of NoSQL, the technologies contained within the movement stand at the pinnacle of what databases can accomplish today.
This article is continued in a series:
- The NoSQL Movement: Document Databases
- The NoSQL Movement: Key-Value Databases
- The NoSQL Movement: Graph Databases
- The NoSQL Movement: Big Table Databases