What to Expect from Open-Source Data Infrastructure in 2023

By on
Read more about author Bassam Chahine.

Open-source technologies will become even more prominent within enterprises’ data architecture over the coming year, driven by the stark budgetary advantages combined with some of the newest enterprise-friendly capabilities added to several solutions.

Here are three predictions for the open-source data infrastructure space in 2023:

1. Economic headwinds will make open-source data technologies even more attractive to enterprises.

With shaky economic conditions expected throughout 2023, budget constraints will draw even more enterprise leaders toward fully open-source data-layer technologies. The rise of enterprise-grade, production-ready open-source software has accelerated as data technologies like Apache Cassandra, Apache Kafka, Redis, and PostgreSQL continue to prove their value in the most data-intensive enterprise environments. Enterprises late to the trend will explore their open-source options as a harbor from economic turmoil in 2023, capitalizing on the opportunity to harness a secure and scalable data architecture without adding unnecessary licensing fees to their budgets. 

This shift will likely continue to erode open-core alternatives, whereby proprietary add-ons (and costly licensing fees) are added to otherwise free open-source projects. With budget scrutiny in 2023, enterprises will take a close look at that open-source vs. open-core decision – and realize that pure open-source versions of key data technologies are more than up to the task.

2. Apache Cassandra 4.0 will have a breakthrough in 2023, both in adoption and as a leading solution in an expanding set of use cases.

Cassandra 4.0’s 2021 launch left many enterprises waiting for the arrival of more robust ecosystem components before committing to an upgrade migration, given that key tools like Cassandra Reaper and others didn’t yet offer support. That essential tool support has now arrived, making Cassandra 4.0 an inviting option for enterprises looking for better performance (particularly when it comes to indexing speed). Cassandra 4.0 will further draw in users with its powerful reliability, support for Java 11, better security features and auditing log, and virtual tables that improve visibility into Cassandra’s performance. Eliminating bugs was a top priority across Cassandra 4.0’s development, with the intention of winning over those still harboring doubts about the open-source solution. A year on, Cassandra 4.0 reliability is beyond question, making adoption a risk-free decision.

Along with widespread adoption, Cassandra 4.0 will emerge as an increasingly go-to option in industries such as banking and utilities. Financial institutions and other enterprises prioritizing security now increasingly look to Cassandra 4.0 for its strict auditing capabilities, supporting security policy enforcement and regulatory compliance. Utilities such as power distribution companies have also emerged as Cassandra 4.0 proponents, leveraging Cassandra 4.0’s high availability and strong write performance to enable high-scale data collection across vast smart meter infrastructure while avoiding expensive downtime or data bottlenecks. 2023 will see a rapid rise in Cassandra 4.0 deployments in these industries, as Cassandra itself improves to offer easier installation and operational management. (It’s also worth noting that Cassandra 4.1 is now available with even more feature improvements.)

3. Apache Kafka will take the next step forward in its evolution.

Apache Kafka is already a prominent and near-ubiquitous open-source data technology for many of the biggest enterprises – but the data streaming platform will take a significant leap forward in 2023, spurred by the separation of Kafka compute from data storage. The Kafka of the near future will feature hot and cold storage, with Kafka handling data collection from those sources. 

The key advance will free Kafka from all data replication and data consistency issues. Kafka hails from an era when a key goal was to utilize commodity hardware to split up data for high availability and disaster recovery, tasks that tools like Kubernetes and availability zones now readily accomplish. By now introducing a separate fast and accessible data layer for Kafka, Kafka brokers can simply ebb and flow with workloads, serving consumers and producers as a conduit to that data layer. Freeing Kafka from all issues around persisting data is an exciting and welcome evolution of the technology that enterprises will be eager to explore. 

4) Applications combining Apache Kafka, machine learning, and Cadence will combine revolutionary intelligence with Kafka’s speed and scalability.

Enterprisers are now successfully demonstrating the strengths of Kafka-ML, with many more likely to follow in 2023. For example, TikTok leverages Kafka ML’s real-time latency and hyperscale processing capacity to deliver streams of content that are customized to each specific user. In another example, Uber Eats is using Kafka ML alongside Cadence (an open-source workflow orchestration tool for fault-tolerant long-running applications) to optimize deal offers in push notifications and effectively increase orders on the service. This clever use case uses machine learning to predict the best times to send notifications to each customer. ML sends the customer-specific notifications and optimal times to Kafka, then uses Cadence scheduling to activate the right Kafka topics at the right time. The sky is the limit for intelligent streaming applications utilizing these powerful technologies in the coming year.

A Surer Bet in Uncertain Times

When it comes to modernized capabilities – and certainly when it comes to price – the right open-source deployment strategy offers enterprises a number of advantages that can’t be matched by proprietary alternatives. In 2023, look for enterprise teams facing tight budgets to increasingly explore their open-source data infrastructure options, and for current users to meaningfully expand how they are utilizing key technologies like Cassandra and Kafka.