MapReduce Archives - DATAVERSITY

A Brief History of the Hadoop Ecosystem

Keith D. FooteMay 27, 2021May 25, 2021

In 2002, internet researchers just wanted a better search engine, and preferably one that was open-sourced. That was when Doug Cutting and Mike Cafarella decided to give them what they wanted, and they called their project “Nutch.” Hadoop was originally designed as part of the Nutch infrastructure, and was presented in the year 2005. The […]

Data Lakes: What They are and How to Use Them

Jaya ByrrajuAugust 11, 2020August 6, 2020

Click to learn more about author Jaya Shankar Byrraju. For most companies, having data means having access to wealth. And the key to fully leveraging the wealth that data represents lies in how effectively companies harness, manage, parse, and interpret it. But first, the data must exist somewhere. Enter data lakes. These are central repositories […]

Data Orchestration Brings Your Data Closer and Makes Access Faster

Jennifer ZainoOctober 22, 2019October 18, 2019

Data orchestration means trying to bring order and speed to a complex Big Data ecosystem, a conglomeration of storage systems like Amazon S3, Apache HDFS, or OpenStack Swift and computation frameworks and applications such as Apache Spark and Hadoop MapReduce. The data stack is fragmented and performance-challenged by a proliferation of data silos. The technology […]

Unifying Big Data Workloads

Michelle KnightMay 23, 2019May 10, 2019

Try querying Big Data sets and computing results through high volumes and variety across multiple independent storage systems – you’ll find a tangled web in the Tower of Babel, where platforms communicate in different languages. Then ask for speedy manipulations with that data set and it seems almost impossible. This describes the challenge faced by […]

Benchmarking Hadoop Performance: On-Premises S3-Compatible Storage Keeps Pace with HDFS

Gary Ogasawara and Tatsuya KawanoApril 5, 2019March 29, 2019

Click to learn more about authors Gary Ogasawara and Tatsuya Kawano. When deploying Hadoop, scaling storage can be difficult and costly because the storage and compute are co-located on the same hardware nodes. By implementing the storage layer using S3-compatible storage software and using an S3 connector instead of HDFS, it’s possible to separate storage […]

The Power of Crunching Big Data Effectively

Lex BoostFebruary 8, 2019February 1, 2019

Click to learn more about author Lex Boost. Not embracing the Big Data trend can cost your company. According to an Accenture study, 79 percent of enterprise executives agree that companies not embracing Big Data will lose their competitive edge. With data creation on track to grow tenfold by 2025, it is extremely important for […]

Predictions for Big Data Analytics in 2019

James KobielusJanuary 7, 2019January 3, 2019

Click to learn more about author James Kobielus. Big Data Analytics has been one of the dominant tech trends of this decade, and it’s also been one of the most dynamic and innovative segments of the IT market. Today’s Big Data Analytics market is quite different from the industry of even a few years ago, and […]

A Year of Blink at Alibaba: Apache Flink in Large Scale Production

Xiaowei JiangMay 19, 2017May 8, 2017

Click to learn more about author Xiaowei Jiang. It has been a great year for Blink, our fork of Apache Flink®, at Alibaba. We went into production with Blink about a year ago, and since then, we have used it to make real-time updates to listings in various search products such as Taobao, Tmall, AliExpress, etc. […]

Hadoop Overview: A Big Data Toolkit

Elissa GilbertJune 14, 2016June 12, 2016

Big Data isn’t new. Forbes traces the origins back to the “information explosion” concept first identified in 1941. The challenge has been to develop practical methods for dealing with the 3Vs: Volume, Variety, and Velocity. Without tools to support and simplify the manipulation and analysis of large data sets, the ability to use that data […]

8 Big Data Trends to Watch For

A.R. GuessFebruary 29, 2016February 29, 2016

by Angela Guess Tom Phelan, Chief Architect of BlueData, recently wrote in InsideBigData, “Over the next year, a growing number of customers will realize the vast business benefits of Big Data and will deploy Big Data solutions across their organization. Technical innovations, the rise of BDaaS, a shifting approach to data locality, platform convergence and […]