Scale-Out ZFS: Scalable Storage for Exponential Data Growth and Powerful Data Protection

By on
Read more about author Jason Lohrey.

Advanced data-intensive applications, the increased use of digitalization, and IoT devices are forcing organizations across various industries to reevaluate how they handle large amounts of data and exponential data growth effectively and efficiently. 

ZFS is a popular storage system because it is powerful and flexible, making it well-suited to handle large amounts of data while providing rich data integrity, protection, and management capabilities that are often difficult to find in other file systems. Many IT professionals consider it to be one of the best file systems available today for several reasons:

  • Data Integrity: ZFS is designed with data integrity as a top priority. It uses checksums to ensure that data is not corrupted or lost, and it also includes features like copy-on-write and RAID-like data redundancy to protect against data loss.
  • Adaptive Replacement Cache (ARC): ZFS uses a sophisticated caching system called ARC. It is designed to cache frequently accessed data and metadata in memory, which helps to reduce disk I/O operations and improve performance.
  • Snapshots and Clones: ZFS includes powerful snapshot and cloning capabilities, allowing users to create point-in-time copies of their data quickly and easily. This ability is useful for backup, testing, and other purposes.
  • Compression and Deduplication: ZFS includes built-in compression and deduplication capabilities, which can help save space and improve performance by reducing the amount of redundant data stored on the system.
  • RAIDZ: ZFS offers its own implementation of RAID called RAIDZ. RAIDZ uses a COW mechanism that provides data protection while also improving performance. It is designed to work with large disks and can provide better performance than traditional RAID implementations.
  • Open Source: ZFS is now an open-source project, which means anyone can use and contribute to its software development. This approach has led to a vibrant community of users and developers constantly working to improve the system.

However, ZFS has a significant handicap – it can’t scale out. ZFS’s biggest limitation is that it is “scale-up” only. As good as ZFS is, today’s ever-growing storage requirements have relegated it to the smaller end of the storage capacity spectrum, with scale-out NAS becoming the Darwinian dominant storage species for large file storage environments. What’s needed is scale-out ZFS, and companies today have new technology options to accomplish just that.

Let’s examine the benefits of combining ZFS with other tools to achieve scale-out capabilities:

  • Significant cost savings and efficiency: Organizations can achieve significant cost savings compared to traditional enterprise-scale storage solutions by combining ZFS with solutions that enable scale-out capabilities while leveraging commodity hardware and open-source software. Solutions with efficient data placement and migration capabilities can also help minimize storage costs by ensuring data is stored on the most appropriate storage technology based on usage patterns.
  • Enhanced performance and productivity: Integrating ZFS with solutions that utilize multiple servers in a distributed architecture can improve performance and reduce the risk of bottlenecks. The increased performance enables faster data access and processing, leading to improved productivity across various business operations. It accelerates workflows, enables seamless collaboration, and empowers data-intensive applications to deliver results more quickly, driving overall business efficiency.
  • Robust data protection and compliance: Data integrity, protection, and compliance are paramount concerns for organizations. Combining ZFS’s inherent data protection features with replication capabilities can ensure redundancy and availability, reducing the risk of data loss and downtime. Adding metadata management capabilities can support compliance with data privacy and security regulations and allow organizations to meet regulatory requirements and mitigate associated risks.

Scale-out ZFS is ideal for a wide range of industries and organizations that demand scalable, cost-effective, and reliable storage solutions to address their Data Management challenges, including research institutions and laboratories, the media and entertainment industry, and the government and public sector.

A Cancer Center’s Scale-Out ZFS System

As an example, a world-renowned cancer treatment and research center known for its innovative research and compassionate patient care is playing a significant role in advancing cancer treatment and improving patient outcomes by building on its research computing storage management system using ZFS in combination with a feature-rich solution that unifies data management processes into a single platform to simply the administration of big data.

The center was storing its research data from more than 200 labs – consisting of large amounts of genomics, CryoEM image, and scientific data – totaling 6 PB and more than 2 billion files on over 30 ZFS Network Attached Storage (NAS) servers. This data supplies the center’s various analyses and AI pipelines. When a server reached capacity, research data (and the researcher) had to be moved to another server with available space in a “Tetris-like” manner – a painful process for the IT team and researchers that became untenable.

The center sought a way to eliminate managing the different logins and capacities on its ever-growing number of servers. Moving to traditional enterprise scale-out NAS was determined too expensive and didn’t provide the center with the flexibility it required.

The center decided to front-end the ZFS storage servers with an advanced cluster controller solution that provided a scalable load-balanced global namespace and single mount point for researchers and instruments with easy management for IT. Research data can be easily accessed, regardless of where it is stored. All the ZFS storage servers can be managed as a single entity, simplifying the storage environment’s administration. And if one copy of the data is lost, corrupted, or unavailable, remote mirrored copies can be used, reducing downtime, and sustaining research activity. Colder data can automatically be archived to low-cost AWS S3 deep archive storage. 

The result is a powerful alternative to expensive enterprise scale-out NAS. In addition, the solution adopted by the center acts as a cache to data in the ZFS servers, accelerating performance and delivering exceptional scalability, security, and efficiency that enables data to be processed quickly by any application. 

While ZFS is a powerful file system with many exceptional features, its inability to scale out has been a significant limitation. However, by combining ZFS with new state-of-the-art technologies, organizations can build a powerful alternative to expensive enterprise scale-out NAS that delivers an affordable, powerful, and scalable storage system that can easily handle large and fast-growing data volumes and protect them from loss.