The recent International Semantic Web Conference produced a number of excellent sessions, including a very popular Big Graph Data Panel, captured on video by the folks at VideoLectures. The panel was moderated by Frank van Harmelen (Department of Computer Science, Faculty of Sciences, VU University Amsterdam), with panelists Tim Berners-Lee (W3C), John Giannandrea (Google), Mike Stonebraker (Massachusetts Institute of Technology, MIT), and Bryan Thompson (SYSTAP).
According to the description of the video, “The Semantic Web / Linked Data has grown immensely over the past years. When the Semantic Web community started working over a decade ago the main question was where to get the data from. By now the question of how to process ever increasing amount of semantic/linked data has come to people’s utmost attention. The goal of this panel is to shed light on the various approaches/options for Big Graph Data processing.”
As might be expected, there was some lively discussion among this group, as they tackle Big Data, graph databases, SPARQL, RDBMS, columnar stores, and several additional topics.
As Juan Sequeda reported for us in an earlier piece, “This panel was a highlight of the conference…Stonebreaker stated that the Semantic Web is another application of a graph database. Berners-Lee responded that he enjoyed staying at the application layer. The issue of “what is big data?” arose and Stonebraker defined it as either Big Volume or Big Velocity or Big Variety. Additionally, he stated that big data is only a problem if your data need grows faster than memory gets cheaper. Giannandrea stated that they control dataset size by reconciling. However, the hard problem is deciding when two things are the same. Berners-Lee commented that you should reconcile the vocabularies you are planning to share. Stonebraker added that the biggest problem is trying to put stuff together after the fact that was not designed to be put together. Additionally, Stonebraker stated that relational technology is old, obsolete technology and has been beaten in every vertical by custom solutions. New database architectures could be 100 times faster. However, benchmarks are lacking. Parallelization was another topic. Stonebraker commented that everything that is done at scale must be parallelized, otherwise it would run forever.”
Image: Courtesy ISWC