Thanks everyone for participating in the #SemWebRox Community Challenge!
Looking at the results (which have been pasted at the end of this article for convenience), I’m struck once again by the diversity of points of view in the Semantic Web community on what the key value of its technology really is. Over at Semantic University we summarized what we believe to be the two dominant camps (summary: AI-centric and flexible data management-centric) in the Semantic Web world, and the results of this exercise illustrate clearly that there are many nuances within those camps.
I’ll go into some highlights, but I think the why is still missing in many cases. It’s the classic features-not-value predicament that plagues technologists and frustrates technology marketers. We’re doing better, but we can and must do better still.
Data Flexibility: Data Integration
In terms of data flexibility, there are a number of themes that kept popping up. Aaron Bradley first called out “cheaper enterprise data integration, and Lee Feigenbaum concurred by stating, “The Semantic Web is the only scalable approach for integrating diverse data.” Another one I liked about data integration was from Abir: “Semantic Web technologies can make it possible to have true bottom-up web-scale automatic information integration.”
However, the one I loved the most was, “flexible data integration. Forget eternal requirements meetings.” Oh man, does that nail it or what? If you don’t have to plan every last detail up front you can have “Collaboration without coordination,” shaving months off the calendar.
The theme of “emergent integration,” is interesting, but the “automatic” is the key value part of this. With Semantics data integration is cheaper and easier, and sometimes comes free.
So I think we nailed the value for data integration. And, by the way, the industry is already paying attention to this. The biggest pure-play Semantic ETL vendor I know of, Expressor Software, was recently acquired by BI giant QlikTech.
Data Flexibility: Evolvable Solutions
Also in the data flexibility theme, I highlighted applications that can evolve more quickly and easily, which is a different take since it focuses on visualization and consumption. The tweet captures this: “#SemanticWeb technologies are extremely flexible, saving time and money in any situation facing unsure or changing requirements.” Another that came relatively close to this value proposition but sticks closer to its technical foundation was: “takes the Model out of the Data Model, allowing more flexibility and reuse – lower cost & better, faster decisions.”
Although there weren’t many people explicitly calling out the flexibility of semantic-powered apps, I know there are many in the community thinking about it. For example, Mike Bergman frequently talks about the open world model of the Semantic Web on his very erudite blog, and the importance of the open world model for data modeling and application development is the expectation of change.
With Semantics, adapting an application is cheap because it is expected and embraced by the technology rather than an expensive inconvenience to avoid.
Data Flexibility: Linked Data
Linked Data posts were also very common. I see Linked Data as part of the data flexibility story as it relates to closely to the flexible data integration. However, I think that the Linked Data champions are still struggling somewhat with a real value proposition.
I loved some of the sentiment (“sharing is caring!” was a great one, and, in fact, is probably the sentiment that has driven the explosive growth of the Linked Open Data Cloud), but most tweets still focused on features, not values. That is, I didn’t see much why in there.
Yes, you can link data between sources (a few people called this fact out). But why does that matter? Does it make apps cheaper to maintain? If all it does it make it a little easier to create mashups since primary keys are built into the structure, I’m not sure that in and of itself is revolutionary (and, more troubling, it gives no incentive to the actual publishers of datasets to strive towards consistent naming).
There were a couple folks who argued for transparency, and I think where government is concerned, transparency is noble; but do you need Linked Data to do this? Wouldn’t publishing CSVs be enough? Sunlight Foundation has done a tremendous job with data that is largely not in RDF, so the format doesn’t seem to have stopped them. I know I’m playing devil’s advocate, but I think this is vitally important for community growth, since there are many skeptics out there. Why is still missing here, and it’s the why that I think we need to do a better job as a community nailing down.
I think Kingsley came second closest to Linked Data value with a classic, “SOA data sources that can be recombined with ODBC, RDBMS, LOD Cloud Data, via links!” It’s a bit techy, but I can summarize it as “SOA done right,” and in general if you buy into the vision of SOA you should buy into Semantics as the best way to get it done.
So, frankly, I think the sentimental value of “sharing for sharing’s sake” is actually the best one so far. Linked Data: because sharing is caring. My worry is that in the long term this only gets you so far. Can we do better?
Artificial Intelligence: Reasoning
This is another case where there is still focus on “you can do X” rather than “you should care about X because Y.” For example, “The Semantic Web makes machines smarter just as the Web has enhanced human knowledge and intelligence,” or “Adding structure & meaning to web data, enabling humans & algorithms to reason by induction, deduction or abduction.” These are descriptive for sure, but don’t tell you why in the same way that the data integration quotes do.
I must confess that I too have struggled with this and was very much hoping for enlightenment!
So I’ll take a shot here, but I really hope others reply in the comments below to continue the conversation.
Back at university I did AI research, focusing on machine learning algorithms for text classification (at the time support vector machines were all the rage). Automatic classification of data is the one area where I clearly see value of using Semantic reasoning and inference.
With Semantics you can do automatic structural classification, which makes creating alerts on complex information much, much easier than using traditional tools. With a relational database, for example, you’d have to do stored procedures or reports and run them regularly, and complexity of data relationships make certain kinds of inference hard.
For example, in immigrations I can imagine a use case something like “if a guy 18-26 enters the united states after having been to a country that is known to harbor members of a terrorist group currently in the top 10 in the CIA watch list and is traveling alone and speaks with a heavy accent then label as interesting subject.” Imagine building that rule in SQL. Nightmare, right? With OWL and reasoning, expressing it is really easy and can be done by non-experts.
So, absent any tweets, here is my attempt.
Semantic reasoning makes automatic complex classification of any data much cheaper and implementable by subject-matter-experts instead of serious DBAs.
For the fans of inference in the audience: what do you think? Am I far off? How do you see it?
I enjoyed this exercise. I think as a community there are a couple applications of Semantic Web technologies for which we have no trouble describing the value proposition. However, there are a couple of others—Linked Data and reasoning in general—for which the why isn’t so clear, and this comes across through the description of feature bullets.
P.S. – I’ll throw a special shout out to Kingsley Idehen, who posted as many attempts as everyone else combined, single-handedly illustrating the diversity of opinion in one man’s Twitter stream!
The Full List of Tweets Submitted for #SemWebRox (in Chronological Order, as of this posting)
Cheaper enterprise data integration #SemWebRox
– (@aaranged on behalf of @kendall))
#SemWebRox because sharing is caring 🙂
#SemWebRox RT @kristiholmes: The semantic web is the only scalable approach for integrating diverse data @giustini @pfanderson #medlibs
– (@kristiholmes via @LeeFeigenbaum)
While, bandwidth is v important, volume (content) is v important, semantics that make sense of it is ultimate! #SemWebRox
– (@GregLBean, in response to @ScottRhodie)