This presentation was given in a live webinar on September 29, 2011
To view the recording of this webinar, click HERE.
This presentation was given in collaboration with:
About the Webinar
Data is Yahoo!’s most strategic assets – from user engagement and insights data to revenue and billing data. Three years ago, Yahoo! invested in a Data Quality program.
By applying industry principles and techniques the Data Quality program has provided proactive and reactive system solutions to Audience data issues and root causes by addressing technical challenges of data quality at scale and engaging and leveraging the rest of the organization in the solution: from product teams all through the data stack (data sourcing, ETL, aggs and analytics) to analysts and sciences teams who consume the data. This methodology is now being scaled to the all data across Yahoo! including Search and Display Advertising.
This presentation covers:
- The solution methodology developed which builds in proactive and reactive DQ capabilities into Cloud-based products up-front and includes end-to-end data focus resulting in system improvements and fast issue resolution
- Solutions for technical challenges in the internet domain in Yahoo!’s massive data environment including end-to-end data monitoring and alerting, abuse and robot traffic detection, and latency vs. accuracy
- The DQ program approach to scale across all Yahoo! data that uses a central and embedded-in-the-businesses model with a strong focus on customer engagement
About the Speakers
Dan Defend & Aparna Vani
Dan Defend earned his Masters in Computer Science from University of Illinois. He has experience at Motorola as an Engineering Manager first in Unix OS and then in embedded cell phone software where he also led analysis and data-driven improvement using Digital Six Sigma methodology. He is currently leading the Data Quality program at Yahoo! where significant improvements are underway involving centralized monitoring and data cleansing across Yahoo!’s audience data pipeline and relying heavily on organizational leverage and distributed ownership and accountability.
Aparna Vani earned her Masters in ECE from University of Houston. She worked as Hardware Design engineer at Compaq and development lead at TVGuide. She has versatile experience as test designer and architecture at Motorola. Currently, she is working as Chief DQ Architect at Yahoo, responsible for data quality strategy and design, working on global organization wide projects like robot filtering and bcookie churn.


















