In 2002, internet researchers just wanted a better search engine, and preferably one that was open-sourced. That was when Doug Cutting and Mike Cafarella decided to give them what they wanted, and they called their project “Nutch.” Hadoop was originally designed as part of the Nutch infrastructure, and was presented in the year 2005. The […]
Benchmarking Hadoop Performance: On-Premises S3-Compatible Storage Keeps Pace with HDFS
Click to learn more about authors Gary Ogasawara and Tatsuya Kawano. When deploying Hadoop, scaling storage can be difficult and costly because the storage and compute are co-located on the same hardware nodes. By implementing the storage layer using S3-compatible storage software and using an S3 connector instead of HDFS, it’s possible to separate storage […]
Predictions for Big Data Analytics in 2019
Click to learn more about author James Kobielus. Big Data Analytics has been one of the dominant tech trends of this decade, and it’s also been one of the most dynamic and innovative segments of the IT market. Today’s Big Data Analytics market is quite different from the industry of even a few years ago, and […]
A Year of Blink at Alibaba: Apache Flink in Large Scale Production
Click to learn more about author Xiaowei Jiang. It has been a great year for Blink, our fork of Apache Flink®, at Alibaba. We went into production with Blink about a year ago, and since then, we have used it to make real-time updates to listings in various search products such as Taobao, Tmall, AliExpress, etc. […]
Hadoop Overview: A Big Data Toolkit
Big Data isn’t new. Forbes traces the origins back to the “information explosion” concept first identified in 1941. The challenge has been to develop practical methods for dealing with the 3Vs: Volume, Variety, and Velocity. Without tools to support and simplify the manipulation and analysis of large data sets, the ability to use that data […]