Wikidata, and a clash of world views

Remember the days before Wikipedia had all the answers? We looked things up in libraries, referring to shelf-filling encyclopaedias. We bought CD-ROMs (remember them?) full of facts and pictures and video clips. We asked people. Sometimes, school home work actually required some work more strenuous than a cut and paste. We went about our business without remembering that New Coke briefly entered our lives on this day in 1985.

Wikipedia is far from perfect, and some of the concern around its role in a wider dumbing down of thought and argument may be justified. But, despite that, it’s a remarkable achievement and a wonderful resource. Those who argued that it would never work have clearly been proven wrong. Carefully maintained processes and the core principle of the neutral point of view mostly serve contributors well.

With Wikimedia Deutschland‘s recent announcement of Wikidata, many of the early concerns about Wikipedia itself have resurfaced once again. This time, though, concerns are even more focused. Whilst Wikipedia was about prose, which has always been open to interpretation, Wikidata is seeking to mess with facts. Depending upon where you sit, that is either the most wonderful (and obvious) piece of harmonisation activity you could wish for… or a fundamental undermining of the very basis upon which contradictory opinions are built. The truth, as ever, lies somewhere in the middle.

As Lydia Pintscher described the project on the Open Knowledge Foundation blog,

The ambitious goal of the project is to create an open data repository for the world’s knowledge that can be accessed and edited by everyone, humans and machines alike. Wikidata will be a place where Wikipedia’s editors and others will be able to collect statements about the world we live in, and references for them. Wikidata will become an enormous open collection of knowledge.

Wikidata first came to my attention when I saw that Denny Vrandečić was going to speak about it at the recent Semantic Technology and Business Conference in Berlin. Denny then joined us on February’s episode of the Semantic Link podcast show, and we learned more about the project and its ambitions. The formal announcement waited until the end of March, and was quickly picked up by sites such as TechCrunch, CNet and others. The response was pretty positive, and everyone seemed to recognise the value of recording basic facts (the date the Berlin Wall fell, the names of the Presidents of France and the USA, etc). Recorded once, these facts are available for use and reuse across pages about countries, positions, people, and more. They’re also available for use and reuse on pages in other languages. Should any of the facts change, they can be updated once and those changes can be made available to every page that needs them. That basic concept is pretty useful. It becomes rather more fraught when politics become involved. The size of ethnic groups in disputed territories are often counted in different ways for different ends. Borders are endlessly disputed. Place names can be politically loaded, and we skirt (with varying degrees of tact) around the question of whether or not some countries exist at all.

A week after the Wikidata announcement, The Atlantic‘s blog ran an opinion piece by Mark Graham of the Oxford Internet Institute; a piece which began with

Fundamental changes are afoot at Wikipedia. Changes that have worrying connotations for the diversity of knowledge in the world’s sixth most popular website.

and ended

We just need to ensure that we aren’t seduced into codifying, categorizing, and structuring in cases when we should be describing the inherent messiness of a situation. Tokyo will always be the capital of Japan, but it will probably be a long time until we can all agree on the true population of Israel.

Much of the reaction on Twitter and elsewhere was typically knee-jerk. The Semantic Web community was offended. It was cross. Mark Graham was an idiot, who had completely missed the point. Only, he hadn’t. He raised some really important issues.

Denny Vrandečić responded at length, seeking to address Graham’s concerns, and this was covered here on SemanticWeb.com at the time. It is unfortunate that Mark Graham does not appear to have replied to Denny (at least in public), as there is scope for a quite fascinating back-and-forth to identify the misunderstandings, the areas ripe for compromise, and the differences of opinion that simply must be left unresolved.

Wikidata, like Wikipedia before it, has the opportunity to become a powerful and liberating tool, capable of capturing, reflecting and shedding light upon myriad views of ‘truth.’ Done right, Wikidata can and will give Wikipedia contributors wherever they are easy access to authoritative facts, enriching their contributions and improving accuracy and consistency throughout Wikipedia’s many linguistic editions.

But Wikidata’s designers — and those that do the work of connecting Wikidata’s stream of facts to the tools editors use in building Wikipedia itself — have a responsibility to ensure that it remains as easy to disagree with the consensus view as it is to go along with it.

