You are here:  Home  >  Data Education  >  BI / Data Science News, Articles, & Education  >  Current Article

On Tap for FindtheBest: More Soft Joins and More Crowd-Sourcing

By   /  December 7, 2011  /  No Comments

FindtheBest (the site where users can compare some 700 topics side by side, initially discussed here) has some new capabilities on tap. This includes what it says are soft joins for relating together diverse data sets.

“Joins is a very hard database connection between two tables. Soft joins are a little more semantic,” says CEO Kevin O’Connor, co-founder of DoubleClick. “We’re linking a lot of data sets together on a loose basis.” To the end, he says, of trying to “cross-relate information a lot more, trying to discern all the data we have and give it some semantic meaning into ways people can understand it. It turns out there’s a huge, huge number of these semantic relationship between our data.”

As an example, someone might be in the market for a pedigree pooch; the site’s begun taking classified ads data, so from a ‘dogs for sale’ search the person might hone in on a Yorkshire Terrier pup from a particular breeder. From there they can link to information about the breed from FindtheBest’s Dog Breed comparison data set, which itself bundles information about Yorkshire Terriers listed for sale on the web, including their price. So, a searcher can judge whether the price for the specific animal they were interested in is in the ballpark or not for the breed. “That’s a soft join to easily access the related data,” he says. “It’s all about the discovery aspect.”

O’Connor says the site’s begun doing more of these joins over the last month or so, with a greater ramp-up over the last week in anticipation of introducing a new UI today. That new UI includes a change to the Smart Rating system, changing it from the classic star approach to a metascore ranging between 0 and 100. That uses two rating sources: a weighted compilation of scores from the most trusted expert sources and FindTheBest’s own Quantitative Rating algorithm that analyzes product features and assigns numerical values based on the quality of the feature.

O’Connor says the site also has ditched an elegant and complicated algorithm that it had been using to tackle the problem of providing related products or services based on what someone was looking at. In some places, “that failed miserably,” he says. “What we realized is that consumers are telling us how things are related by dong side-by-side comparisons. So we actually implemented an algorithm that looks at what they’re comparing – for instance, if they’re looking among colleges at Harvard, we see what they compare it to most. And it’s a pretty accurate list.”

For instance, if you are looking at an Ivy League school, it probably isn’t very relevant to you to see other schools related by tuition or location – you really want to see other Ivy League schools. But if you’re looking at a beauty school, on the other hand, price and location probably matter a lot more. “Sometimes, especially in the semantic web, we come up with all these algorithms of how things are related and we overanalyze stuff, and the crowd sourcing is really the way to do it,” O’Connor says. “It’s like AI. You get  sort of 70 percent of the way there, and then the noise overwhelms everything and takes over.”

This experience of throwing away months of work on the algorithm and letting consumers’ own side-by-side comparisons inform suggestions, he says, has confirmed one thing for him. “It teaches me there is still  a role for humans,” O’Connor says. “Computers can’t take over everything.”


About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Data Science Use Cases

Read More →