This presentation was given Tuesday, August 23, 2011 at the NoSQL Now! 2011 Conference in San Jose, California
About the Presentation
- History of the Map-Reduce technology
- Typical use-cases: when it should be used and then it is not appropriate
- Map-Reduce abstraction
- High-level components
- HDFS – hadoop distributed file system
- Main features of namenode and datanodes
- Performing calculation on top of the hadoop distributed file system.
- Main features of jobtracker and tasktrackers
- Yahoo case study
- general setup
- data processing
- reliability statistics
- Liveops case study ( near-real time data processing)
- data collection
- rapid calculation on hadoop cluster
- Sharing cluster between different uses using pool scheduling
- Key issues in starting a pilot project:
- data preparation
- Initial setup for hadoop cluster
- ongoing maintenance of the hadoop cluster
- Monitoring and maintenance
- Summary and take-away points
About the Speaker
Serge Blazhievsky is an experienced developer and architect with a rich background in C++/Java and distributed systems. His latest venture, LiveOps, Inc. uses Hadoop infrastructure for all reporting needs. LiveOps Hadoop framework was completely designed by him and satisfies very strict performance and availability requirements. Serge’s prior ventures include Attributor, Inc. where he designed Hadoop infrastructure used for Internet crawling and web-page analysis. He holds a Masters Degree in Computer Engineering from Santa Clara University, CA, located in the heart of Silicon Valley. Serge is a regular attendee and contributor to various Hadoop conferences including Hadoop User Group at Yahoo, the creator of Hadoop.


















