by Angela Guess
A recent press release states, “The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache® Hadoop® v3.0.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing. Over the past decade, Apache Hadoop has become ubiquitous within the greater Big Data ecosystem by enabling firms to run and manage data applications on large hardware clusters in a distributed computing environment. ‘This latest release unlocks several years of development from the Apache community,’ said Chris Douglas, Vice President of Apache Hadoop. ‘The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud’.”
The release goes on, “Apache Hadoop 3.0.0 highlights include: HDFS erasure coding —halves the storage cost of HDFS while also improving data durability; YARN Timeline Service v.2 (preview) —improves the scalability, reliability, and usability of the Timeline Service; YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads; Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines; Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.”
Read more at Globe Newswire.
Photo credit: Apache Software Foundation