What got Talis going on its Linked Data marketplace that launched in public beta in June, dubbed Kasabi? The recognition that there had to be a business model in making RDF data as easy to use as possible – from publishing it to querying it, all in a well-supported and sustainable fashion for providers and consumers alike.
“We felt we had a good understanding of the challenges people faced taking their first steps with publishing Linked Data, and also with development communities trying to use that data in their applications,” says Leigh Dodds, the product lead for Kasabi and program manager for the Talis Platform, a role that led him to become quite familiar with what organizations face in hosting and publishing Semantic Web data. “We wanted to make it simpler and easier for both sides, so we began looking at a marketplace environment.”
The data-as-a-service marketplace – semantic or not – suddenly does seem to be on fire. Microsoft has its Windows Azure Marketplace that’s tied closely into its cloud hosting platform; InfoChimps has its downloadable and API-accessible structured and unstructured data sets, with a strong focus on Twitter; xignite has its financial market data on-demand; SimpleGeo goes the location data route, and so on. And Dodds says he’s had plenty of discussions with a number of the vendors in the space as Talis was developing Kasabi. “Some are just bulk data downloads, like a CSV download. Then there are those doing more online transactional access to data,” he says. “That is the area that we fit but our difference is the fact that we are using Semantic Web technology. That gives us some unique features.”
With RDF Linked Data, he explains, it is very easy to mix data from lots of different sources to create custom data set applications. And Kasabi is contributing to that with added curation features. Right now, Dodds says, the service is more about helping people sign up and create their self-contained triple stores: “A milestone for us to get to public beta was to have a complete end-to-end workflow so that you can sign up, create a data set, start to populate it and have people use APIs against that data.” That’s in place and the next stage is to build on that in lots of ways — including giving developers tools to pinpoint the multiple data sets they want to use and blend them together in a way that’s relevant to them. “That’s a curation task that isn’t easy for people to do. That process adds value, to apply expertise to understand which are good data sets to draw on,” Dodds says. “The data aggregation will be the key one to build a lot of value.”
Where Kasabi Caters
Kasabi is aimed at addressing a broad community. The service supports the open data crowd and already has onboard open data datasets such as the MusicBrainz project that contains RDF representations of albums, artists, tracks, labels and their relationships with Lookup, Augmentation, Reconciliation and other APIs. It also wants to be a resource for public-sector organizations that want to make data available for free but need to do it on a low-cost platform (currently you can find, for instance, Ordnance Survey data sets from the British Government). And it wants to support those offering their data on a commercial basis, too. For consumers, each data set is API-accessible via both core and contributed APIs (from users themselves) that users can subscribe to.
Both sides of this data ping-pong ball need support – the publishers on the end of providing good documentation, presentation and licensing terms of their Linked Data sets and the users on the end of being able to explore it, and to have summarizations of the data sets and how they inter-relate so they are easier to use. “Even if you look at some people really deeply embedded in Linked Data community, the data they put up is not as well-documented, as well-presented as it could be, so it’s a challenge for anyone outside that community to really engage with it,” says Dodds. “One of the guiding principles to adopt with Kasabi is, if we can make sure all the best practice data publishing features are there, the data owner doesn’t have to worry about it. It’s less of burden on them, and so developers then are off to a good start in terms of what they can look at to explore that data.”
Developers, for instance, often find it hard to gets answer of Linked Data sets for even the simplest questions, such as who published the set, when was it last updated, or what license it is available under – answers they must have to be sure the data is trustworthy and up-to-date, and so they can use it legitimately. “We’ve built those in with Kasabi so that all of that is immediately available from the home page,” Dodds says. Any data set that goes into Kasabi has to have a license of some sort associated with it. “At the moment a lot of Web APIs have custom terms and conditions, so one thing I hope we can do with Kasabi as a marketplace is to bring some clarity around how data is being licensed and get some common licensing terms to make it easier for the developer to understand how to use it,” he says. “Look at what Creative Commons has done. We need to get to that same kind of state with data licensing.”
He imagines, for example, making it easier for commercial developers to have a channel to communicate with publishers of open data who make it available for non-commercial use to arrive at some sort of ‘click-to-agree’ transactional price based on usage. “Then the workflow becomes so much easier over time,” he says. Quite often, he says, data owners themselves don’t know what models might be available to them or how much effort it will be to service that business. “With Kasabi there will be an off-the-shelf-model you can adopt and just start to use that,” he says. Data hosting is free so it’s a low risk way for data owners to understand if there is a demand for its information. “One role a marketplace has is demand aggregation. If there’s a place you can go to say. ‘I want data on this topic or about this area,’ then it becomes a way for companies to understand whether there are people interested in what they have to offer,” Dodds says.
As the service matures it can potentially realize revenue via commercial models based on transactional use of the data – revenue shares with commercial data sources it’s hosting or charging for high volume usage of SPARQL endpoints. There’s also the potential for charging for custom data hosting –some companies, for instance, might want to have a completely private space for their own Linked Data. (Right now, some controls can be put up around data that is hosted on the site, but even if visitors can’t use it, they’ll know it’s there because it’s part of the public marketplace.) Organizations could realize value, he says, from using Kasabi to host their completely private Linked Data and mix it with other publicly available data sets in the system. “They may not want to give away or sell all their data but blend it with other sources –there’s some definite value there,” he says.
That said, it’s not as if a lot of organizations out there are sitting on boatloads of their own Linked Data. “I don’t think we’ve solved the challenge completely yet of data that is not in Semantic Web formats. That’s still the main point of friction for anyone to adopt the technology to begin with,” Dodds acknowledges. There are professional services to help organizations get over the learning curve, of course, such as those offered by Talis. Another way to help get people past that barrier is for Kasabi to partner with different organizations that have technology it could integrate into the platform to make certain types of data easier to ingest or accumulate from different sources. That way, data owners don’t have to get into the details of how it’s modeled or can be converted, he says. That’s an idea being mulled over.
The bottom line is that Kasabi wants to make it clear to the world that it can create and generate value around Linked Data and the Semantic Web, “so it becomes more of a draw and organizations go up the learning curve,” he says. On the roadmap over the next few months is engaging with various development communities to show them the potential of the platform as well as help them address the challenges they say they face; bring in more data from different sources; and starting to expose more structure and relationships between data sets.
Dodds has, he says, a notion of progressive disclosure around what information is needed and at what level of detail it is required to solve a particular problem, whether you’re a business user or a developer. “My goal is that a business analyst or user or stakeholder could come here, find a dataset that is useful for their app and point developers at it, so both parties have all the information they need and the platform provides the tools and support they need to get that with a minimum of effort from the data publisher,” he says.