According to a recent press release, “The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache Hadoop v3.2.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing. Now in its 11th year, Apache Hadoop is the foundation of the US$166B Big Data ecosystem (source: IDC) by enabling data applications to run and be managed on large hardware clusters in a distributed computing environment. “Apache Hadoop has been at the center of this big data transformation, providing an ecosystem with tools for businesses to store and process data on a scale that was unheard of several years ago,” according to Accenture Technology Labs. ‘This latest release unlocks the powerful feature set the Apache Hadoop community has been working on for more than nine months,’ said Vinod Kumar Vavilapalli, Vice President of Apache Hadoop. ‘It further diversifies the platform by building on the cloud connector enhancements from Apache Hadoop 3.0.0 and opening it up for deep learning use-cases and long-running apps’.”
The release goes on, “Apache Hadoop 3.2.0 highlights include: ABFS Filesystem connector —supports the latest Azure Datalake Gen2 Storage; Enhanced S3A connector —including better resilience to throttled AWS S3 and DynamoDB IO; Node Attributes Support in YARN —helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels; Storage Policy Satisfier —supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories; Hadoop Submarine —enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster; C++ HDFS client —helps to do async IO to HDFS which helps downstream projects such as Apache ORC; Upgrades for long running services —supports in-place seamless upgrades of long running containers via YARN Native Service API (application program interface) and CLI (command-line interface).”
Read more here.
Image used under license from Shutterstock.com