Why 2021 Will Be a Big Year for Apache Cassandra (and Its Users)

By on

Click to learn more about author Ben Bromhead.

The upcoming GA release of Apache Cassandra 4.0 is set to be the most stable “.0” release of the project (or any distributed database) ever. The effort across the entire community has been monumental and everyone involved with this release will deserve not only a well-earned lap of victory but also a couple of drinks to unwind with. The focus on stability, usability, and must-have enterprise features will be a huge boon to anyone currently running Cassandra in production and worth a look for anyone who is evaluating Apache Cassandra now (or has in the past).

Here’s a quick rundown on what organizations (including ours, as we’ll be upgrading our own infrastructure to the first GA release) will be getting with the fully open-source Cassandra 4.0 release, now expected very soon:

Unprecedented Stability

Cassandra 4.0 has been developed with the stated ambition to achieve “the most stable major release to date.” This goal has now been realized, in part through a focus on improving Cassandra’s ability to replay and record workloads as they occur on the cluster, and by intelligently generating edge case tests synthetically.

Building on this core tenet of repeatability enhances testing and software development, enabling rapid testing to resolve even hard-to-reproduce bugs, as well as known bugs and edge cases. Cassandra operators now have several new testing frameworks at their disposal, from fuzzing to property-based testing to fault injection. Cassandra 4.0 makes it easier than ever to test workloads, improvements, and configuration changes, and to resolve any potential issues that crop up.

Enterprise-Grade Auditing

Cassandra 4.0 introduces new auditing capabilities that include full query logging and traffic replay. These enable operators to comprehensively audit all database user activities through configurable actions. Every read, write, log-in attempt, schema change, and other action is logged and available to scrutinize.

These features are especially inviting to enterprise Cassandra operators: Audit logging and traffic replay tick crucial boxes when it comes to demonstrating compliance with SOX, PCI DSS, GDPR, and other regulatory requirements. The new release’s powerful high-level interface simplifies enterprise-grade auditing practices – whether they’re required or simply just prudent. Specifically, Cassandra’s auditlogviewer utility enables inspection of operator-configured audit logs tuned to specific users, keyspaces, or commands. The fqltool allows inspection of logs using full query logging. Audit logs feature configurable log rollover and are securely saved on the node outside the Cassandra database. These features empower operators with greater confidence in the compliance and security of their Cassandra deployments.

Enhanced Performance

The high-performance Netty Transport Framework, previously used in a small set of areas of Cassandra, is now broadly adopted with Cassandra 4.0.

Netty provides asynchronous event-driven networking code that enables better intra-node communication. Whereas past Cassandra releases required N threads to be maintained per peer and a lot of performance-sapping context switching, Cassandra 4.0 now uses a single thread pool for all intra-node connections. Netty also brings the benefits of zero copy streaming to SStables, enabling five times faster streaming performance. This rebuild of Cassandra’s networking implementation provides further sizeable performance advantages, reducing P99 tail end read latency by over 40 percent in some use cases, while facilitating faster and easier scalability for large clusters and dramatically accelerating node recovery.

Virtual Tables

Included in the richer toolset for observability and monitoring that Cassandra 4.0 offers out-of-the-box are virtual tables. Previous releases required operators to establish JMX access to view key information such as metrics, running compactions, clients, and configuration details. Not so anymore. Virtual tables provided in Cassandra 4.0 offer read-only system tables that contain this information, and can be queried with CQL. In doing so, virtual tables simplify the monitoring of key metrics, and enable integrations for building valuable observability tools.

A Foundation for the Future

Finally, Cassandra 4.0 offers support for Java 11, paving a path for simplified and better performing JVM, garbage collection, and other options down the road.

The Cassandra community has clearly demonstrated its deep dedication and vast capabilities with the achievement that is Cassandra 4.0. Any organizations – from start-up to enterprise – considering Cassandra in its pure open-source form, or current operators looking at making the upgrade, should vet Cassandra. Cassandra is backed by a community that will only offer more powerful and refined benefits going forward.

Leave a Reply