Non-relational databases (aka NoSQL), in many different variations, have become a popular database model for handling Big Data. They use two novel concepts, making them different from earlier, “classical” database models. The concepts having so much impact are horizontal scaling, spreading out the storage and work, and eliminating the use of “Structured Query Language” (SQL) to organize the information. Instead, NoSQL data stores use a “non-relational” model.” Non-relational stores have the ability to search through massive amounts of constantly changing unstructured, structured, and semi-structured data. Over time NoSQL databases have evolved to include a variety of different models. Hadoop and Cassandra are just two of the 225 NoSQL-style databases listed at NOSQL.
Data Modeling is essentially a “communications process.” It translates a complex software design into an easily understood blueprint, using symbols and text to represent the flow of data or other processes. This blueprint then provides a common reference during the planning process, and is used as guide in constructing the new software programs. Traditionally, data models are built during the design and analysis phases of a given project, as a way to ensure the requirements of an application are thoroughly understood. A data model attempts to capture all possible relationships within the program. A well-organized data model will allow developers and other stakeholders to identify problems and needed changes before the programming code is written. Data modelers will use many models, viewing the same data using different ones. This helps ensure all the processes, data flows, and relationships are identified. Such practices have had to change though wih the growth of non-relational database systems.
Relational Databases Continue Being Used
Relational databases came from a time when corporations used mainframes and “only for” business applications. A time before the Internet, social media, Big Data, and the Digital Economy changed the nature of the modern business landscape. Relational databases were designed to operate with a single server and bigger was better. Increasing the server’s capacity meant a physical upgrade of the memory and processors. Relational databases are known for their consistency.
Additionally, relational databases (primarily SQL) need to have their schemas defined (or redefined) before the data gets added. The location for its storage must be prepared in advance. For example, storing “new” data about customers, such as email addresses, requires first setting up the storage category. A relational database requires a column to be added to the database, which then moves the full database to the new schema. This can be very slow process when the database is extremely large, and can involve a significant amount of downtime. If the data is changed frequently, the downtime could become a frequent problem.
Another frustrating problem is the relational database’s inability to effectively process unstructured data or data that is not known in advance. By non-relational standards with multiple different data types, SQL is inflexible and tedious.
NoSQL Continues to Gain in Popularity
When compared to relational databases, NoSQL databases are much more scalable and offer much better performance. Non-relational databases offer some useful benefits not available in relational databases, including:
- Integrated Caching: Many non-relational databases include integrated caching abilities, and store regularly used data in the system’s memory as often as possible, removing the necessity for separated caching layers. Some non-relational databases provide a database management layer, which is used for workloads needing high throughput and low latency.
- Flexible Data Modeling: Non-relational databases are flexible and easy to modify. They lack the rigid structure of relational databases, and consequently are much easier to alter and input data. The end result is a less complicated rapport between the user and the system.
- High Availability: Non-relational databases are typically designed to assure redundancy and replication. Non-relational databases can automatically distribute equal amounts of data over multiple servers, so the application can be available even if one server fails.
- Performance: By adding inexpensive servers, organizations can increase performance with non-relational databases. This allows organizations to provide reliably fast, continuous service.
- Scalability: Non-relational databases use multiple computers, creating a horizontal scaling technique which makes it simple to change capacity quickly, with no down time. With the auto-sharding and automatic replication features used by non-relational database, the costs and problems of manual sharding in a SQL database are eliminated.
NoSQL Security Issues
The initial offering of NoSQL databases had a number of security issues. In February of 2015, it was reported that one popular NoSQL database had 40,000 situations where the databases were nearly totally unsecured. Three security problems non-relational databases have are Data Governance, encryption, and authentication. Effective Data Governance is a persistent problem with non-relational databases. For example, Hadoop never had Data Governance principles included during its creation. (A group, labelled DGI, or Data Governance Initiative, has been working with this problem.) Most NoSQL databases early on did not come with a built-in encryption system. (Encryption can be added after the fact.) In the case of 40,000 users listed in the article above, the failure of security and authentication was the result of poorly written instructions and a general lack of security consciousness. The program’s installers generally missed the significance of activating security mechanisms. They simply left the systems open to access from the internet.
NoSQL and the Cloud
Use of the Cloud is a good idea for short term Big Data research projects. Using Cloud vendors, who offer web services having nearly unlimited capacity, and taking care of all the necessary infrastructure administration tasks, is much less expensive then setting up an in-house NoSQL database. Organization leaders can avoid paying for the expensive, complex platform their new applications will need. These are some benefits gained from using the Cloud:
- Faster to the Market: A research project can be up and running within a few hours, rather than dealing with setup issues for weeks or months
- Monetary Savings: There is no need to invest in infrastructure or additional staff. Only pay for what is needed.
- Flexibility: The services being paid for can be adjusted quite easily and the project can be worked on, or accessed, from anywhere in the world.
- Reliability: Typically, distributed servers are located around the world, and offer disaster recovery services.
Forrester, a market research company focused on technology, has published a forecast predicting NoSQL and Hadoop will see significant growth over the next five years. The report, called Big Data Management Solutions Forecast 2016 to 2021, suggests NoSQL and Hadoop will grow between 25.0% and 32.9% each year. Additionally, their analysts predict Big Data technology will grow three times faster than the total technology market.
Forrester analysts go on to predict In-Memory Data Fabric (an alternative approach to Big Data warehouse design) will grow at an annual rate of 29.2% over the next five years.
Jennifer Adams, a senior Forrester forecast analyst, wrote,
“The complexity and richness of data is changing, along with exploding data volume and velocity. Unstructured data, such as text, tweets, graphs, and video, is an increasingly important source of information. Not surprisingly, we expect non-relational databases to be the fastest growing sector within Big Data management solutions.”
Data Modeling with NoSQL
Relational and non-relational databases present two different paradigms. SQL Data Modeling focuses on the design theme, “What are the answers?” NoSQL Data Modeling focuses on, “What are the questions”? The SQL database is a much more limited paradigm, and as a consequence, relational databases don’t work well with graph-like models. Data Modeling in a relational database is often slower and must be planned with a pre-defined schema beforehand.
NoSQL databases, on the other hand, use many different models such as key-value, graph, document, and wide column. NoSQL databases have flexible data models capable of supporting the large volume of unstructured data being generated by modern applications. This non-relational data model is much more flexible. The model can change quickly with the changing needs of the organization and simplifies combining data from multiple sources. The modeling is done quite differently though, since the schema is flexible, standard Data Modeling is not usually done. NoSQL databases have their “modeling” in the code, so it’s a completely different practice.