Last week, we covered the story of how Chris Testa, Director of Engineering at Ad.ly, Inc. brought the Semantic Web to Hollywood. Today, in Part II, Chris shares his recommended 5-Step process for Linked Data Integration.
1. Understand what your “things” are
- Look for the high value entities in your system — the ones bringing money and business intelligence over competitors (Examples: Advertisers, Brands, Celebrities)
- Look for models that are growing quickly in your system (For us, it was Celebrities)
- Look for things that are well annotated, popular things in culture & technology
2. Choose a Linked Dataset:
- dbpedia and Freebase are cornerstones of the Linked Data movement
- There are tons of specialized datasets in many fields (biomedical, events, news, gov’t, so much more!)
- Once you link up, linking to more becomes much easier!
3. Reconcile your things:
- Reconciling is matching the entities in your database with remote linked data sources
- Freebase’s matchmaker is a really useful tool for reconciling
- Make it a game, put experts on it to ensure high quality datasets
- Heuristic methods exist to tackle queues in the 100k+ count
4. Build business intelligence:
- Tip: There are really simple things you can do with linked data that are cool!
- For example, display context to users around reconciled entities in your project. Context makes things easier for users.
- Index and search on reconciled properties like full name, gender, genre, profession, etc.
5. Feedback & maintenance
- Users won’t trust the data unless it is manicured.
- Add lots of negative feedback loops (Unlike buttons!) to make sure that users are heard.
- A few minutes a day of cleanup does wonders!
See Chris’ SemTech 2011 presentation on slideshare: How Hollywood Learned to Love the Semantic Web:
Additional Reporting by Jennifer Zaino with contributions from Chris Testa, Director, Engineering, Adly, Inc.