The semantics-as-a-service play Extractiv emerges from beta today, with its hopes set on serving as the go-to place for helping create applications built on rich, structured data.
The company, the result of a joint venture between 80legs web crawling service and NLP vendor Language Computer Corp., has a few new things on the way to add to its semantic web crawls for converting any structured data found on web pages (to the tune of 100,000 documents per hour) to semantic data.
It’s expanded to include more entities and relationships; it’s now offering an on-demand service for processing individual URLs or local documents; and it’s added entity linking and RDF outputs.
“With on-demand and web crawling together we are basically positioning Extractiv as a starting point for the Semantic Web,” says 80legs CEO Shion Deysarkar. “With RDF and linking, too, it will really help Extractiv position itself as the beginning of the Semantic Web, the point between unstructured and structured data. Whaever unstructured data you have and structured data you want to get out, Extractiv is the bridge between those two.”
The company says it now has 156 entity types, which it claims lets it identify more entities, concepts, facts, people and places than other semantic services out there. On the relations front, it enables extracting of both type and generic relations on both the web crawling and on-demand platforms; types being a specific kind of semantic information, such as ‘Person Age or Subsidiary Of,’ and generic covering a broad range of semantics, similar to Subject-Verb-Object triples. In comparison to the Open Calais web service, Extractiv says, it processes more entities (156 to 39), and supports entity linking across all types of entities vs. a handful.
RDF outputs are joining JSON, XML and HTML, which, says Extractiv President John Lehmann, “should make a lot of semantic web folks really happy.” As should the automatic linking of all types of entities, he points out, initially to the public data set DBpedia – “we can officially say we are providing services to the Semantic Web since Semantic Web people generally say that if it’s not linked, it’s not useful.”
The on-demand service will be available in a few different versions, including a free one that handles up to 1,000 documents per day and up to 1,000 URLs per web crawl. Companies also can subscribe to a $99 per month/$50 per 100,000 document version that supports up to 1 million URLs per web crawl and up to three simultaneous web crawls, or a premium version for $299 per month/$50 per 100,000 documents whose unlimited access includes up to 10 million URLs per web crawl and up to five simultaneous web crawls.
On the roadmap are sentiment (for powering aggregation services and listening platforms) and topic classification data (to identify what a page is about) outputs. “We want people to look at Extractiv as, ‘I want this kind of semantic data and this is the feature set I use from Extractiv, and this will get it out for me,’” says Deysarkar.
The future also may include partnerships with services the likes of Infochimps. “There are two things possibly in the long term: Running our own feeds into data marketplaces for semantic data, which we could easily do,” he says. “Another thing is actually let Extractive users to publish the data they get from Extractive to data marketplaces.”