Welcome to Magazine Premium

You can change this text in the options panel in the admin

There are tons of ways to configure Magazine Premium... The possibilities are endless!

Member Login
Lost your password?
Not a member yet? Sign Up!

Slides: Data Quality Challenges & Solution Approaches in Yahoo!’s Massive Data

September 30, 2011

This presentation was given in a live webinar on September 29, 2011

To view the recording of this webinar, click HERE.

This presentation was given in collaboration with:

About the Webinar

Data is Yahoo!’s most strategic assets – from user engagement and insights data to revenue and billing data. Three years ago, Yahoo! invested in a Data Quality program.

By applying industry principles and techniques the Data Quality program has provided proactive and reactive system solutions to Audience data issues and root causes by addressing technical challenges of data quality at scale and engaging and leveraging the rest of the organization in the solution: from product teams all through the data stack (data sourcing, ETL, aggs and analytics) to analysts and sciences teams who consume the data. This methodology is now being scaled to the all data across Yahoo! including Search and Display Advertising.

This presentation covers:

  • The solution methodology developed which builds in proactive and reactive DQ capabilities into Cloud-based products up-front and includes end-to-end data focus resulting in system improvements and fast issue resolution
  • Solutions for technical challenges in the internet domain in Yahoo!’s massive data environment including end-to-end data monitoring and alerting, abuse and robot traffic detection, and latency vs. accuracy
  • The DQ program approach to scale across all Yahoo! data that uses a central and embedded-in-the-businesses model with a strong focus on customer engagement

 

About the Speakers

Dan Defend & Aparna Vani

Dan Defend earned his Masters in Computer Science from University of Illinois. He has experience at Motorola as an Engineering Manager first in Unix OS and then in embedded cell phone software where he also led analysis and data-driven improvement using Digital Six Sigma methodology. He is currently leading the Data Quality program at Yahoo! where significant improvements are underway involving centralized monitoring and data cleansing across Yahoo!’s audience data pipeline and relying heavily on organizational leverage and distributed ownership and accountability.

 

Aparna Vani earned her Masters in ECE from University of Houston. She worked as Hardware Design engineer at Compaq and development lead at TVGuide. She has versatile experience as test designer and architecture at Motorola. Currently, she is working as Chief DQ Architect at Yahoo, responsible for data quality strategy and design, working on global organization wide projects like robot filtering and bcookie churn.

Related Posts Plugin for WordPress, Blogger...

Tags: , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *


Add video comment

FOLLOW US!

Friend me on FacebookFollow me on TwitterJoin my group on LinkedInWatch me on YouTubeRSS Feed

User Login

Lost Password

 

 

Latest Tweets

Twitter