Happy Birthday Wikipedia — But DBpedia Has Reason To Celebrate Too

By   /  January 17, 2011  /  No Comments

There was a lot of hoopla over the last week over Wikipedia’s 10-year anniversary. But we here at The Semantic Web also want to note an event that may get more modest coverage from the traditional media outlets, and that is today’s release of DBpedia 3.6, based on Wikipedia dumps dating from October and November 2010.

That brings the queryable Wikipedia information set up to more than 3.5 million things described, with 1.67 million of them classified in a consistent ontology that includes 364,000 persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video games, 148,000 organizations, 148,000 species and 5,200 diseases. Its 672-million RDF triples-strong dataset boasts 286 million extractions from the English Wikipedia and 386 million from other language editions and links to external datasets.

The latest release also includes Wikipedia infobox mappings in languages other than English, with 13.8 million RDF statements based on mappings in the /ontology/ namespace. And it features the initial release of the DBpedia MappingTool graphical user interface for creating and editing mappings as well as the ontology.

A virtue of the impressively sized, cross-domain knowledge base that DBpedia has become through extracting and combining structured information of various sorts from Wikipedia is that it automatically evolves as Wikipedia changes. So, should that raise a question of whether Wikipedia itself is changing enough – and fast enough – to continue to ask ever more cool questions in Dbpedia that leverage Wikipedia data? One of the sophisticated example queries that Dbpedia notes as being possible today, for instance, is discovering “soccer players, who played as goalkeeper for a club that has a stadium with more than 40,000 seats and who are born in a country with more than 10 million inhabitants.” (See the screen shot  for results.)

But the recent news coverage of Wikipedia’s anniversary points to some gaps that perhaps could perhaps influence how much people can – or can’t – get out of DBpedia. In an interview that aired on New York Public Radio on Saturday, Wikimedia Foundation executive director Sue Gardner noted that Wikipedia’s quintessential editors tend to be male, young (about 25 years old), and graduate students, typically over-represented in the areas of science, technology, engineering and mathematics. “Eight seven percent of Wikipedia editors are male, and so topics that would associate or correlate with being female are certainly less well-covered than topics that correlate as being interesting to men,” she noted in the interview.

Wikipedia has said it is trying to get more women and older people on-board, as well as to increase commentary in areas such as the humanities and public policy issues. It’s been working with 16 universities to have professors assign students to write articles for it in that last area, for instance. Bringing aboard more individuals to craft material could make a difference not just to diversity of subject coverage, but also to address the fact that the number of active editors of Wikipedia has been flat since 2007.

Assuming Wikipedia is able to fill these holes, DBpedia’s users could experience even more benefits. Today, for example, a search for women’s voting rights advocate Anne Clay Crenshaw won’t turn her up on Wikipedia. But tomorrow, perhaps, it will. And from there, it’s just a short step to DBpedia’s being able to add her (and some of her other neglected colleagues) to its knowledge base, opening the door for people to run queries that will bring back all the 19th-century women suffragettes in their results (see current entities in the women’s rights activism concept above).

After all, that information can be just as important to some searchers as finding the names of every Tom Cruise movie of a certain vintage. So, happy birthday Wikipedia and congrats DBpedia!

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Property Graphs: The Swiss Army Knife of Data Modeling

Read More →