Last week the New York City Council gave its nod of approval to legislation that would require city agencies to publish public data sets in a common format on an online portal for the public’s use. Mayor Bloomberg just signed off on it, with the Open Data Bill legislation to be phased in over six years.
But semantic tech startup Ontodia hopes to help speed up the development of the Big Apple as the Digital City of the Future with NYCFacets, a Smart Open Data Exchange for the developer community just released that catalogs all the NYC-related data sources already present in the New York City Open Data Catalogue.
“There are about 900 data sets in the New York City Open Data Catalogue,” says Ontodia co-founder Joel Natividad. Last year, while at TCG Software Services, he was part of a team that won the Large Organization Recognition Award at BigApps 2.0 – the city-sponsored contest for developers to use NYC Open Data – for participating in creating NYC Data Web, which integrates the NYC.gov data sets into a single web of data for developers. The team also included Revelytix and Spry. “Now that the Open Data Bill just passed, there will be a tsunami of data,” he says.
NYCFacets stores about 1.5 million metadata facts across all those data sets, providing more than conventional metadata in the process, so that developers can make sense of what’s out there in the catalogue. It was something Natividad says he and co-founder Sami Baig needed to do for their own efforts as developers, given that the catalogue data wasn’t exposed in such a way that promoted quick navigation, discovery and exploration. NYCFacets says it provides that to developers with a robust discovery, query federation, and semantic integration mechanism. Among its features to navigate the data are faceted search, Google Instant-like searching, search auto-completes, semantic browsing, visualizations, inline queries, drilldowns, and multi-way data explorers.
Today developers can get a score for the catalogue’s data sets through the service’s Pediacities Rank, a compilation of metadata derived using semantics, statistics, algorithms and the crowd that considers issues such as freshness, download and view count (that’s the crowd-knowing aspect). “So, users can have a signal to tell them if it’s good quality,” Natividad says.
At the same time, the Rank can help inform the publishers themselves about how good a job they’ve done with their data. “Going through the catalogue we found a lot of quality issues and conflicting information,” he says. He says that it’s the pattern that when the next Big Apps event is announced, there’s typically a big push of data, which then doesn’t gt updated. “So the ranking algorithm can tell the publisher, ‘Your data is stale.’”
One thing he says is important to the effort is to try to help developers use NYCFacets without requiring them to have an encyclopedic knowledge of how semantic web technology works. “So often in the semantic tech space a lot of people talk triples, triple stores, RDF, and we didn’t want to alienate some potential users that are developers who are mostly still in the Web 2.0 camp, where they’re familiar with APIs and JSON,” he says. “We’re trying to help developers consume the feed without telling them they need to connect to a SPARQL endpoint…We’re going to expose feeds in a format familiar that is familiar to Web 2.0 types, to make the learning curve more shallow.”
The plan is to extend NYCFacets not just to expose all the metadata in the New York City Open Data Catalog, and derive additional metadata to further developers’ understanding of each dataset, but also to do the same with other potentially collaborative sources, such as DBpedia and U.S. Census data, and to catalogue web sites that use these data sources.
“Now everyone talks Big Data and this is Big Data for cities. We think semantics is the only way to really overcome this data tsunami,” says Natividad. “Machines are good to produce data so let’s use them for scoring, preprocessing, and inferencing.” One of his hopes is that by the next BigApps contest, a framework will be in place that developers can use to quickly build mashups rather than spend their time massaging data.
The big goal? “To do the same thing for New York City data that Bloomberg did for finance data,” says Natividad.
Meanwhile, NYCFacets is entered for BigApps 3.0. The current voting period for contest submissions ends Thursday.