Openness, transparency, and agility are where the world is headed. However, these trends are problematic for those of us who have intellectual property – including software, data, and other products – that we seek to control access to for many legitimate reasons (e.g., our livelihoods depend on being paid for them).
Open access to free data is happening everywhere, regardless of whether it's convenient to those of us who own copyrights. Open data is an important trend, and, regardless of what the cynics may say, it's not an ideological cover for intellectual proprietary pirates. In fact, it's a core principle of the emerging world economy.
Open data may be what economists term a "public good." In other words, the public benefits when open data from diverse information sources is made freely available. What this means is that the general public can use the data without the need to pay royalties, licenses, or other fees. The broad economic benefits of open data may far outweigh the benefits of keeping it all closed and proprietary behind one or another "paywall."
This is the thesis behind a recent McKinsey Global Institute (MGI) study called "Open data: unlocking innovation and performance with liquid information" (see this link to the full downloadable report). Here is an excerpt that nicely summarizes the report's perspective, which focuses on the economic multiplier effects:
"Open data—from both public and private sources—are adding a new dimension to big data analytics and giving rise to novel, data-driven innovations. Businesses are finding new ways of segmenting markets by blending open data with proprietary data and discovering new ways to raise productivity by using open and proprietary data to benchmark operations. Consumers are benefiting from open data by gaining more insights into what they buy, where they go to school, and how they get around (for example, with mobile apps that use open data to show the flow of traffic and public transit)."
Public good and commercial benefits need not be mutually exclusive, and MGI provides ample quantitative, industry-specific depth to their study, which is organized according to seven industries that can and perhaps should lead the way on open data. These consist of education, transportation, consumer products, electric power, oil and gas, health care, and consumer finance.
"Our research suggests that [these] sectors alone could generate more than $3 trillion a year in additional value as a result of open data, which is already giving rise to hundreds of entrepreneurial businesses and helping established companies to segment markets, define new products and services, and improve the efficiency and effectiveness of operations."
To realize the economic potential of open data, while protecting commercial interests from piracy and individuals from intrusion into their private lives, MGI says that the world's nations need to step up several ongoing efforts. These include accelerating adoption of information-sharing standards, implementing regulatory environments for protection of intellectual property, and defining comprehensive privacy-safeguards that are adopted universally.
The following thought occurred to me while reading the report, suggested by the notion of "liquid information." If open data is indeed a type of common currency, then we all benefit from expanding its liquidity, much the same way that central banks introduce macroeconomic multipliers by expanding the liquidity and circulation of money.
Should the governments of the world establish central open data overseers equal in status with their central bankers? If both data and money are common currency with macroeconomic multiplier effects, and if both are managed as liquid public goods, wouldn't that make great sense?
And if open data is a valuable currency, perhaps open reference graphs of public data, generated by analytical models of interconnections, can further boost the global economy. That's a related thought I had upon seeing the large downloadable hyperlink graph of the world's webpages that data scientist Vincent Granville presents in this recent blog. The graph, generated in 2012, covers 3.5 billion web pages and 128 billion hyperlinks between these pages. It's the product of a non-profit, Common Crawl, that describes its mission as "providing an open repository of web crawl data that can be accessed and analyzed by everyone."
Reference graphs such as this are more than idle curiosities for data scientists. I recently blogged on how open reference data can serve the goals of most branches of science. Granville's article connected the same dots for me, focusing on the important role of open reference graphs in a wide range of ends in the public interest. He says:
"We hope that the graph will be useful for researchers who develop search algorithms that rank results based on the hyperlinks between pages; SPAM detection methods which identity networks of web pages that are published in order to trick search engines; graph analysis algorithms and can use the hyperlink graph for testing the scalability and performance of their tools; and Web Science researchers who want to analyze the linking patterns within specific topical domains in order to identify the social mechanisms that govern these domains."
When you realize that heavy-hitting graph analysis is the foundation for behavioral sciences of all sorts, the significance of this initiative hits home. Open reference graphs are as fundamental to the promise of big data as open data, open platforms, and other dimensions that I discussed in this recent blog. For example, public policy makers and the general public should be guided by shared data-driven graphs of social dynamics, economic trends, environmental interdependencies, and emerging threats. Having these open data graphs for our collective reference, we can intelligently debate alternative policies in the light of what has actually worked in the past and what is likely to bear fruit in the future.
Granville notes that "to the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft."
From a public policy standpoint, why should such a powerful resource – these reference graphs – be available only to private institutions? This is no slap against any company who has adopted graph analysis, since they all did so for very good reasons: an operational need plus the resources to develop the enabling technologies.
I'm simply calling for more companies to consider open-sourcing reference graphs and data, just as many now open source their software intellectual property when it serves a larger public good.