Meet polyglot persistence.
It’s not a new term, but one that’s catching fire in what many call the unsexiest part of Data Management – Database Management. It also ties to the physical part of Data Management, which is storage management, often overlooked by data teams.
Storage management and database management/administration indeed used to be a separate concern. The data engineer or scientist was more worried about creating proper data workflows from multiple sources.
Until performance and optimization became issues, that is. Today’s data teams work with multiple data sources, each using multiple data types.
Take a simple e-commerce platform, for example. It may need to store session data, work with search data, and process geo-located payment data.
You need to code and build connections to get these databases to talk to each other. But there is one major catch: There is a long laundry list of databases out there that are good at one thing and not the other.
To create the right interconnections, you need to know which databases are suitable for you – which can be difficult as new technologies come online nearly every day.
The Case for Polyglot Persistence
The polyglot persistence model offers a different approach to the same problem.
It would have, for example, allowed us to work with Elasticsearch for search results, enable MongoDB to store user information, use Memcached to handle the cache, and let Azure keep the financial transactional data.
Like its cousin, polyglot programming, it acknowledges that no one storage solution fits all your applications’ data storage needs. So why not create a hybrid answer to today’s database and storage management challenges?
Polyglot persistence is catching on because the industry is creating applications differently. Traditionally, monolithic applications worked with a single database or in a monoglot fashion.
These days, applications use a microservice architecture. So, a single application is run as a set of individual functional scoped services. This allows agility and scalability.
But microservices introduce a new challenge. Every microservice will be running a database for each component. This means implementing queries, and atomic, consistent, isolated, and durable (ACID) transactions can be a challenge.
So, database programmers need to handle this issue with care when they stitch different databases together. The query logic needs to be precise.
Then, you have the issue of choosing a suitable database for a specific use case. There are multiple flavors of the same type of databases, each performing well in a particular use case.
For example, you may want to store schemaless document databases like MongoDB for storing your documents. On the other hand, to connect millions of entities and users, you may prefer a graph database like Neo4j. Finally, for time-series data storage, you may choose the venerable Cassandra.
Polyglot persistence allows you to use a specific database for each of your microservices.
Beware of the Challenges
Polyglot persistence has its cons.
The first and most apparent is complexity. Adding a specialized database for different microservices will require specific systems and expertise. Never mind the hardware; finding the right people with polyglot knowledge of databases can be challenging.
Integration layers can become dynamic; finding the failure points can be another challenge. Not everyone has the patience of Sherlock Holmes to investigate, nor do they have the time to do it at today’s digital speeds. Keeping everything working fine and consistently needs additional resources.
Then, you have the issue of finding the root cause of a problem. Triaging one when your application as a monoglot is easy; following the different threads to a problem in a polyglot persistence model can be onerous when you have different versions of the truth. This means Data Governance can be a significant challenge.
This creates another headache: compliance. Without a holistic picture of all your data and unified audit trail data, data breaches can be costly.
However, the benefits are persuasive.
The idea of keeping services loosely coupled by not tying them to a single database, the ability to choose the best database (or even swap out for one that is more reliable) for each microservice based on the use case, and the idea of giving each data workload the best and most optimal environment for processing and storage are making this trend the one to watch.