There was a lot of hoopla over the last week over Wikipedia’s 10-year anniversary. But we here at The Semantic Web also want to note an event that may get more modest coverage from the traditional media outlets, and that is today’s release of DBpedia 3.6, based on Wikipedia dumps dating from October and November 2010.
That brings the queryable Wikipedia information set up to more than 3.5 million things described, with 1.67 million of them classified in a consistent ontology that includes 364,000 persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video games, 148,000 organizations, 148,000 species and 5,200 diseases. Its 672-million RDF triples-strong dataset boasts 286 million extractions from the English Wikipedia and 386 million from other language editions and links to external datasets.
The latest release also includes Wikipedia infobox mappings in languages other than English, with 13.8 million RDF statements based on mappings in the /ontology/ namespace. And it features the initial release of the DBpedia MappingTool graphical user interface for creating and editing mappings as well as the ontology.
A virtue of the impressively sized, cross-domain knowledge base that DBpedia has become through extracting and combining structured information of various sorts from Wikipedia is that it automatically evolves as Wikipedia changes. So, should that raise a question of whether Wikipedia itself is changing enough – and fast enough – to continue to ask ever more cool questions in Dbpedia that leverage Wikipedia data? One of the sophisticated example queries that Dbpedia notes as being possible today, for instance, is discovering “soccer players, who played as goalkeeper for a club that has a stadium with more than 40,000 seats and who are born in a country with more than 10 million inhabitants.” (See the screen shot for results.)
But the recent news coverage of Wikipedia’s anniversary points to some gaps that perhaps could perhaps influence how much people can – or can’t – get out of DBpedia. In an interview that aired on New York Public Radio on Saturday, Wikimedia Foundation executive director Sue Gardner noted that Wikipedia’s quintessential editors tend to be male, young (about 25 years old), and graduate students, typically over-represented in the areas of science, technology, engineering and mathematics. “Eight seven percent of Wikipedia editors are male, and so topics that would associate or correlate with being female are certainly less well-covered than topics that correlate as being interesting to men,” she noted in the interview.
Wikipedia has said it is trying to get more women and older people on-board, as well as to increase commentary in areas such as the humanities and public policy issues. It’s been working with 16 universities to have professors assign students to write articles for it in that last area, for instance. Bringing aboard more individuals to craft material could make a difference not just to diversity of subject coverage, but also to address the fact that the number of active editors of Wikipedia has been flat since 2007.
Assuming Wikipedia is able to fill these holes, DBpedia’s users could experience even more benefits. Today, for example, a search for women’s voting rights advocate Anne Clay Crenshaw won’t turn her up on Wikipedia. But tomorrow, perhaps, it will. And from there, it’s just a short step to DBpedia’s being able to add her (and some of her other neglected colleagues) to its knowledge base, opening the door for people to run queries that will bring back all the 19th-century women suffragettes in their results (see current entities in the women's rights activism concept above).
After all, that information can be just as important to some searchers as finding the names of every Tom Cruise movie of a certain vintage. So, happy birthday Wikipedia and congrats DBpedia!