Big Data and the Semantic Web are on a track to intersect. And businesses that want to be on track to profit from the explosion in data should start looking a little more closely at that intersection, and soon.
“We’ve got more data now than ever before coming at us, and it is coming faster and faster,” says Frank Coyle, director of the Software Engineering program in the Lyle School of Engineering at Southern Methodist University, whose research is in the area of web services and semantic web technologies. “So the semantic angle is how can you organize this data to take advantage of it, to do queries over it.” Those in the semantic web community say RDF is the way to go, he says, adding that people now use the term linked data as another way of describing semantic data. “If you take Big Data and link it, then you have semantics – you have meaning now introduced into the equation.”
Coyle – who will be presenting a talk entitled, “Relationships Matter – Leveraging Semantic Technology to Extend Your Business Horizons,” at the upcoming Semantic Technology and Business Conference in the U.K. next week – says that most companies today are dealing with Big Data in all its forms: structured, semi-structured and unstructured. And RDF, the language of the semantic web, offers a simple sentence structure, where triples consist of a subject, predicate and object, to help put all those data elements in relationship to each other.
So, if your business is grappling with trying to get value out of all three kinds of data, “if you take it to the simple sentence structures of RDF, if you convert it to RDF, then you are in a position to use semantic web tools such as SPARQL to do queries over this data, and integrate it in a cohesive way rather than separately dealing with each of those categories,” he says.
The hardest of the three data types to structure in RDF triple form is unstructured data, he says. But because your unstructured data generally is on the web on some sever, it has a URL associated with it. “When you get into the details of RDF you have to have a URL to find your subject,” Coyle says. “So immediately you can at least begin to talk about some of this unstructured data. Once you access it, if it’s text there are tools you can use to run over it to extract subjects, predicates and objects from that, like Open Calais,” for one.
Most companies aren’t close to players like Google or Facebook when it comes to driving incredible value from Big Data, but many of them at least have experience with structured and semi-structured data to find hidden jewels, which will put them in good stead for moving to the next level of dealing with unstructured data. And, Coyle adds, there’s no reason not to tap into existing staff talent to begin to generate triples from all the data the company has, and then use SPARQL to navigate and perform queries over those triple stores. “You can take a conventional database person who knows SQL and you can easily get them up to speed on SPARQL,” he says. “An advantage of this approach is that you can presumably use the expertise you have in the company to help you with this vs. going outside and hiring specialized people.”
And, not only does the semantic approach present new opportunities for companies to make use of data for their own internal ends, but you could imagine a situation where a company might be able to devise linked semantic data products to offer to other companies as a paid service, he says. “We are at the beginning of things but the thing is, the payoff is going to be surprising.”
You can register to attend SemTech U.K. here.