Mark Graham recently raised some concerns regarding the Wikidata project in The Atlantic. Graham writes, "Wikidata will create a collaborative database that is both machine readable and human editable and which will underpin a lot of knowledge that is presented in all 284 language versions of Wikipedia. In other words, the encyclopaedia plans to become part of the movement from a mostly human-readable Web to a Web in which computers and software can better make sense of information… The reason that Wikidata marks such a significant moment in Wikipedia's history is the fact that it eliminates some of the scope for culturally contingent representations of places, processes, people, and events. However, even more concerning is that fact that this sort of congealed and structured knowledge is unlikely to reflect the opinions and beliefs of traditionally marginalized groups."
Graham Continues, "It is important that different communities are able to create and reproduce different truths and worldviews. And while certain truths are universal (Tokyo is described as a capital city in every language version that includes an article about Japan), others are more messy and unclear (e.g. should the population of Israel include occupied and contested territories?)."
Denny Vrandečić, project director of Wikidata, posted a thoughtful response to Graham's article. I have re-posted Vrandečić's response in its entirety:
Thank you for your well-thought criticism. When we were thinking first of adding structured data to Wikipedia, we were indeed thinking of giving every language edition its own data space. This way the Arab and the Hebrew Wikipedia community would not interfere with each other, nor would the Estonian and the Russian communities interfere with each other. Actually, they wouldn't even interact with each other. They could happily build their niches and purport their own points of view of the world, and then they would come together in the English Wikipedia, where they would be forced either to abstain from the conversation or to find a common ground and compromise. This would not necessarily translate back in the language editions - they could remain in their carefully crafted filter bubbles. Readers not able or willing to read different languages on an article where they are not even aware of the controversies would return from Wikipedia with the satisfying feeling that they learned something about the world, and would shake their heads about the ignorant inhabitants of the neighbouring country who believe some obvious misconception about the issue.
We still opted for having one common data space for all language editions. Does this mean we expect the whole world to agree on one common set of true facts, saved and redistributed in Wikidata, the perfect form of Wikiality, and everything else will be considered falsehood and lies? Not in the least.
First, Wikidata will not be about The Truth. I expect the Wikidata community to follow the spirit of the Wikipedia community, and require citations and references for the data. We do not expect the editors to agree on the population of Israel, but we do expect them to agree on what specific sources claim aboiut the population of Israel. They will be able to gather several sources with their sometimes contradicting data. So we might have the population according to the Israeli statistics office, according to the Egyptian staistics office, according to the CIA World Fact book, and according to even more sources. Instead of hiding these differences in their respective language editions, we can have one space to gather them all and display them side by side, making the disagreement explicit and visible.
Second, Wikidata will not force anything into the Wikipedias. For every step of the different possible ways the data can flow from Wikidata to the Wikipedias, there will be ways to opt out for every language edition. The language editions can choose to give preference to certain sources. The language editions can opt out to use Wikidata for a specific value, and replace it with a locally agreed fact. The language editions can even ignore Wikidata entirely and just continue as they had the last decade. Wikidata is an offer, and not a mandate.
Third, Wikidata will have a different coverage than Wikipedia. A lot of issues that you mentioned are far too nuanced to be expressed in Wikidata. Let us take the example of the Bronze Soldier of Tallinn that you mentioned: whereas a text, featuring an intepretation of the symbolism of the statue can lead to controversy and discussion, what points of data about it would be? The material? The height? The date of erection? Its current geolocation? None of these statements are disputed, and they could be used in the Estonian, Russian, and English version alike. What about your second example, the population of Israel? Does it include Gaza or not? Well, this kind of information can be made explicit in Wikidata. Our knowledge model will enable the editors to state "The population of Israel in 2012, excluding Gaza, was X, according to the following sources". I think that once you consider the limits of what can be stated in Wikidata, and the importance I expect to be given to properly referencing the sources, the number of expected controversies will be much smaller than many expect now.
Fourth, you rightfully point out that the Wikipedias today are mostly written by a specific contributor demographics. This is true, but it glances over the fact that it used to be even more specific. With the growth of Wikipedia the contributor demographics have expanded and diversified - not yet as much as one might hope, but it is getting better. One of your points raised was that Wikipedia has not many contributors in Africa. We actually hope that Wikidata will improve this situation: since all languages will work on the same data space, contributions from Africa and from Europe will live side by side, and the motivation for contributing to a common space that everyone will benefit from - and not just the much smaller language community one belongs to - might increase the number of contributions coming from regions underrepresented today (compare this to the situation in countries like Uzbekistan, where a language like Russian binds a lot of the attention and possible contributions to the bigger and more succesful Wikipedia language edition).
Fifth, in your criticism you implicate the idea that languages are good and valid borders for keeping knowledge diversity alive. If this was true, how comes that English language articles, where communities otherwise separated by language often come together and create article of higher quality and reflecting a richer diversity than the individual language articles? My own experiences are rooted in the Croatian, Serbian, Bosnian, etc. Wikipedias, all language editions of their own. The richness of diversity that the English Wikipedia article show on topics of the Yugoslav wars is not matched by any of the native language editions.
What is particularly interesting about your criticism is that Wikidata was developed with support from the EU research project RENDER, which has its main concern about knowledge diversity. We had discussions about some of our research results in the past, especially the Wikipedia map, not so unsimilar to some of your own results. In RENDER we developed the requirements for a data model that is centred on the ideas of being a possibly inconsistent, secondary data source, not being about The Truth.
Whereas I understand your concern from an abstract view on the issue, I challenge you to point to the actual articles that you fear will get poorer in their diversity once Wikidata will be operational. You cite your own and your colleagues research on this issue, so I assume your concerns are based on real use cases.
I am sorry for this long answer, but since I consider your concerns would be very valid if Wikidata was done in a more naive way, and since I understand that many people will think that Wikidata is being developed in such a naive way, I took the liberty to expand more on our current thinking of how Wikidata could work, and some of the design decisions in building Wikidata.
Thank you for this opportunity!
Denny Vrandecic, project director Wikidata
To learn more about Wikidata, join Denny Vrandečić and Mark Greaves as they present Wikipedia's Next Big Thing: The Wikidata Project at SemTechBiz San Francisco. Register by this Thursday, April 12 to save $400 off a full access pass.
Image: Courtesy Wikidata