Every day New York City is getting closer to being the Digital City of the Future. It’s a long journey, though, and one that the semantic web community can lend a hand with.
At this week’s Semantic Technology & Business Conference in NYC, Andrew Nicklin of the Office of Strategic Technology and Development, NYC Department of Information Technology & Telecommunications (DoITT) provided a look at what has been accomplished so far, and what’s on the to-do roadmap. Recent months have seen accomplishments including the passage of Local Law 11 of 2012 – the “most progressive legislation in the U.S. as far as cities being mandated to open data,” Nicklin said in an interview with The Semantic Web Blog before his keynote address at SemTech. “It ensures permanency for our program past any administrative changes….The whole notion of open data doesn’t go way because it is written into law.”
The state of NYC open data at this point encompasses some 900 data sets at the NYC OpenData portal, representing a good cross-section of the city’s agencies, The data sets have been used in projects including the city’s Reinvent Green hackathon, the Reinvent NYC.Gov hackathon, and the NYC BigApps 3.0 challenge, where Ontodia’s NYCFacets took the grand prize as best overall app. (Joel Natividad, CEO and co-founder of Ontodia, also spoke at Tuesday’s SemTech event). The focus immediately going forward won’t be so much on expansion of the number of data sets as it will be on keeping those data sets as close to real-time as possible with the city’s back-end systems, says Nicklin, who calls it “a big effort.” That said, in the next five years the city’s open data will grow in order to execute on the city’s local law, which requires that all internal city data that can be made public has to be made public. “So you will see a large increase,” he says. “We will have to figure out how much there is useful to people and how to make it more useful and relevant as we expand into the thousands.”
The city’s open data has to be published in machine-readable format. “We have a platform that’s really good at making that information available,” says Nicklin. “We have a fully interactive portal where we upload structured data to it and it makes it available in multiple formats,” from CSV to JSON to PDF to RDF. (The data sources are available through downloadable formats as well as interactive APIS for the benefit of the developer community.) “But,” Nicklin says, “we’re not sure we have all the metadata and semantics around RDF necessary to be really effective at it.”
That’s one place where semantic pros can offer some help, he thinks. “I don’t know that we have a strong presence as far as Linked Open Data is concerned now, but we are definitely headed in that direction,” he says. Strong interest by consumers of the data will help, and at his talk he planned to discuss the city’s desire to have the input of the semantic web community to support and drive LOD efforts forward.
Open Data Within And Beyond The City
The city created a wiki for citizens to collaborate with it on its open data policies, technical standards and guidelines. In September it published a technical standards manual as required by the legislation, and there’s work ahead in assisting city agencies with opening up their data in a way that is practical and sensible and meets the requirements of Local Law 11. As for the wiki, it is considered to be an evolving document. Content is being updated there to align with the published document, and comments are still being accepted. Moreover, says Nicklin, “as we head to Linked Open Data, for instance, we will want to implement new standards around that, so will have to update the document accordingly, and the wiki is there with that notion in mind.”
New York City is going beyond its borders, too, partnering with the federal government on Cities.Data.Gov, a unification of data from multiple cities across the country. “The whole idea there is building a common platform by which municipalities can share data specification,” he says. “The notion is that if we all share the same type of information about the same topic, we should share it in the same way so that it is easily consumable by anyone who wants it, regardless of what city they want it from. The data should be in the same structure and format so people don’t have to write 12 different parsers to intake that data from all the cities involved.”
So far, it’s more about schema than the semantic web per se. Take the concept of bike routes in different cities – NYC, Boston, Chicago. “It’s more important at the moment to get into a common schema,” he says. That said, at the conference on Tuesday he commented that, "I really do think semantic technologies will help us connect that information effectively so that you can do comparisons between [cities].But we do have a long way to go. You as a community can tell us what are the best ways to drive down that path."
One issue across city governments across multiple cities, he notes, is that the semantics behind their businesses can be very different. A burglary in NYC may mean something legally different from a burglary in Chicago, as an example. “There is a whole question about how to normalize” such things across many different areas, he says. “So it is a big effort to create good ontologies that are consistent on a national basis and then applying and mapping our own data into ontologies. It’s a challenge but we are very interested. It’s baby steps to that for the moment.”
Looking down the road at where NYC goes, Nicklin at the SemTech event on Tuesday also explained how important it is to making information available to the populace in the way they're likely to look for it. "Semantic technologies really need to come along for the ride for this. Part of the reason is to get information to Google, Bing and all the search engines. People don’t go to nyc.gov first. They go to other search engines. So we need a way for information to be easily consumable not just by humans, but by search engines."