Four New Apache Cassandra 5.0 Features to Be Excited About

By on
Bassam Chahine headshot
Read more about author Bassam Chahine.

With the recent beta release of Apache Cassandra 5.0, now is a great time for teams to give it a spin and discover 5.0’s most interesting and anticipated new capabilities. 

As I’ve poked around with the new beta, here are four features introduced with open-source Cassandra 5.0 that developer teams should be excited about:

1. Vector Support: Introducing Vector Search, New Functions, and a New Vector Data Type

Cassandra 5.0 adds Vector Search, a particularly powerful new feature for finding relevant content within large datasets, along with new CQL functions and a new vector data type that saves and retrieves embeddings vectors. Importantly for many, these new features make Cassandra 5.0 an ideal data-layer technology for teams pursuing AI/ML projects – providing the specific functionality those projects require alongside Cassandra’s existing high availability, scalability, and open-source benefits. 

For ML models, performing similarity comparisons is critical to understanding data and data connections in context. For example, AI applications from product recommendation engines to generative AI chatbots operate by recognizing patterns and extrapolating decision-making based on the similarity of new data inputs and queries to existing training data. Being able to store embeddings vectors – arrays of floating-point numbers that communicate how similar specific objects or entities are to one another – is key to enabling those crucial similarity comparisons. Therefore, Cassandra 5.0 is now a go-to solution for AI application development.

2. Storage-Attached Indexing

Cassandra 5.0’s new Storage-Attached Indexing (SAI) optimizes the lifecycle of secondary indexes, while also making them more efficient stores and easier to use. SAI allows Cassandra users to create one or more secondary indexes on a database table, with each index based on a single column of the user’s choice.

This highly scalable, globally distributed column-level indexing offers unmatched I/O throughput for search – including Vector Search. SAI also features modular extensibility, with Vector Search serving as an initial demonstration of this capability. SAI indexes can capture semantics by indexing both queries and content (including large inputs such as documents and images) to achieve exceptional indexing functionality.

3. Trie Memtables and Trie-Indexed SSTables

Cassandra 5.0 users can leverage significant potential performance improvements and memory optimization that comes with this version’s new trie (prefix tree)-based Memtables and SSTables. While Cassandra is best known for its distributed architecture, these storage formats utilize tries and byte-comparable representations of database keys to improve Cassandra’s performance for reads and modification operations, as well as for correctly sizing structures to data. Trie Memtables and Trie-Indexed SSTables also reduce the burdens of memory management overhead and garbage collection, making it simpler for high-scale organizations to manage their data.

The bottom line: these features for reducing storage overhead – while improving scalability and write and read performance – will earn Cassandra users’ attention and appreciation. 

4. New Aggregation and Math Functions

Cassandra 5.0 adds new native CQL functions, and the ability for users to build their own new user-defined functions. These additions serve to expand the speed and flexibility with which users can accomplish their goals with Cassandra.

New native aggregation functions include:

  • count – Find how many items are in a collection
  • max and min – Find the maximum or minimum items of a collection
  • sum and avg – Find the sum or average of the items in a numeric collection

New native functions for operating on collection columns include:

  • map_keys – Get the keys of a map
  • map_values – Get the values of a map

New native math functions include:

  • abs – Returns the absolute value of the x
  • exp – Returns the value of e (the base of natural logarithms) to the power of the input
  • log – Returns the natural logarithm (base e) of the input
  • log10 – Returns the base 10 logarithm of the input
  • round – Returns the closest integer to the input

Give It a Go

Those interested in harnessing the advantages of Cassandra 5.0 highlighted here should try it out for themselves, and get ahead of the curve when it comes to utilizing and optimizing fully open-source Cassandra.