Fundamentals of Document Databases

A document database (sometimes referred to as a “document store” or a “document-oriented database”), is a NoSQL or non-relational database. However, document databases use an index to associate “keys” with “documents,” making them more efficient at retrieving data.

Unlike relational databases, document databases are not structured with a format of rows and columns, but scale horizontally. A major strength of document databases is their streamlining ability, allowing developers to work more easily with data by providing the “same” document-model format being used in the application code. The document database is flexible, semi-structured, and evolves as the applications’ needs change.

A document database offers excellent support for content management applications (blogs and video platforms). It also works well with user profiles. Document databases offer flexible indexing, analytics over document collections, and efficient ad hoc queries.

Within a document database, each collection of data the application tracks may be stored in the form of a single document (similar to objects). This makes it easier for developers to update applications as the organization’s needs change. Additionally, if the data model must be changed, only the affected documents have to be updated. Because of this, a schema update is not required, and there is no database downtime needed to make the necessary changes.

Document databases are excellent for storing catalog information. For example, an internet sales application works with different products having a wide variety of attributes. Attempting to manage thousands of attributes within a “relational” database is inefficient, and typically the reading performance is reduced. With a document database, a single product’s attributes may use one page for a description (a synopsis, of sorts), providing easier management and faster reading speeds.

Storage in Document Databases

A document database associates a unique key with a data structure called a “document.” The key is used as a simple identifier (ID), typically in the form of a string, a path, or a URI. It can be used to locate and pull the document from its database.

Normally, a document database maintains an index of the keys to improve document retrieval speeds. In some cases, the key is required to create or insert the document into the database. This arrangement can maintain a variety of key-value pairs, key-array pairs, and even “nested” documents (documents within documents). A document is treated as a single complete unit, and splitting it into parts is generally avoided.

Documents in a document database are roughly equivalent to the programming concept of an object. They are not required to follow a standard schema and do not uniformly maintain the same slots, sections, keys, or parts. Generally speaking, programs that use objects have a wide variety of objects, and often those objects will have many optional fields. Each object, even those taken from the same class, may look very different. Document databases are similar, allowing different types of documents to be saved, allowing their fields to be optional, and will often allow documents to be encoded using other encoding systems.

Another strength of document databases is their ability to retrieve documents using their content. For example, a query retrieves all the documents within a certain field, set to certain values. The specifics of configurations and indexing options that are available depend on needs and can vary greatly.

Document databases fall under the heading of NoSQL databases; XML databases (optimized for XML documents) are a subclass of document databases. Graph databases have some similarities to document databases, but add a relationship layer, allowing them to find documents more rapidly.

JSON and REST

“The right tool for the right job” is useful advice. This wisdom also applies to the database a developer chooses for application. Document databases are for developers who want to stay focused on developing an application. Within a document database, the data is saved in freeform “documents” combining a variety of fields with any number of nested structures. These documents are normally represented as JSON (JavaScript Object Notation) and are updated either through APIs or by way of sending JSON to an appropriate REST (REresentational State Transfer) endpoint. The majority of modern programming languages support both JSON and REST.

JSON is generally used for data interchange “formats” on the internet. A data interchange format (also referred to as “data exchange format” or “data format”) uses text to communicate data between platforms. JSON is especially useful because it is readable by both humans and machines. This provides humans reading the data with the ability to find meaning in it, directly. JSON is an interchange format many systems agree to use for transferring data.

REST describes an “architectural style” of software that provides common standards between different computer systems communicating on the web. The REST architectural style supports flexibility in code changing. The code used on the client’s side can be changed without affecting the server’s operations, and the code used on the server’s side can be altered without affecting the client’s operations.

So long as the “format” of the messages being sent through REST architecture is known and agreed upon, the two sides can be kept separate and modular. This is considered a plus because by separating the user interface concerns from the data storage concerns, scalability improves (by way of simplifying server components). As an additional benefit of the separation, each component is free to evolve independently.

Scaling Horizontally

Horizontal scaling, the art of gaining data storage by the addition of more servers, is generally faster and less expensive than vertical scaling, which requires adding more resources to a server. It is also the norm for NoSQL databases, and by extension document databases.

Horizontal scaling is often appealing to information technology specialists and frequently recommended for cloud computing purposes. A benefit of horizontal scaling is its ability to offer redundant data storage. Redundant data storage reduces the chance of a partial system failure that will crash the entire system or will compromise the operations. The option of creating powerful systems by simply adding low-cost generic hardware components is also appealing.

Strengths and Weaknesses of Document Databases

Document stores are very flexible. They handle semi-structured and unstructured data well. Users don’t need to know during set-up what types of data will be stored, so this is a good choice when it isn’t clear in advance what sort of data will be incoming. Users can create their desired structure in a particular document without affecting all documents. Schema can be modified without causing downtime, which leads to high availability. Write speed is generally fast, as well. Document databases are useful for:

Analytics platforms
Blogging platforms
Content management systems
E-commerce platforms

Document databases are not the best choice for running complex search queries or for applications requiring complex multiple operation transactions.

Because document databases have a flexible schema, they can store documents having different attributes and data values. Document databases are a practical solution to online profiles in which different users provide different types of information. Using a document database, you can store each user’s profile efficiently by storing only the attributes that are specific to each user. Should a person decide to change information on their profile, the document representing them can easily be replaced with the new version. Document databases provide an easily managed system, with a high level of fluidity and individuality.

Historically, gaining useful business intelligence from operational data has been hampered because analytical databases and operational databases were managed in different environments. Being able to read and research operational information, during real time, has become a critical process in highly competitive business environments. When using a document database, a business can save and manage its operational data from different sources, while feeding the data to a Business Intelligence engine for analysis.

To manage content effectively, it must be collected and aggregated from a number of sources before delivering it to the customer. Due to the flexible schema of document databases, they are an excellent choice for collecting and saving data. Document databases can create and incorporate unique types of content, which includes user-generated content, such as comments, images, and videos. There are several document databases to choose from.

Image used under license from Shutterstock

TAKE OUR DATA MANAGEMENT CERTIFICATION PREP COURSES

Data Topics

Fundamentals of Document Databases

Storage in Document Databases

JSON and REST

Scaling Horizontally

Strengths and Weaknesses of Document Databases

Leave a Reply Cancel reply