by Charles Roe
DATAVERSITY™ recently interviewed Bob DuCharme, the Director of Digital Media Solutions at TopQuadrant. Bob will be giving a session at the NoSQL Now! Conference in San Jose, CA from August 19-21, 2014. The session is titled “Semantic Web Standards and the Variety “V” of Big Data.”
The Speaker Spotlight Column (and its parallel venture the Sponsor Spotlight Column) is an ongoing project that focuses on highlighting several of the central issues represented at the many Data Management conferences produced by DATAVERSITY.
The primary emphasis of the interview was to question Bob DuCharme on his work and history within the industry, with particular importance on his presentation at the upcoming conference:
DATAVERSITY (DV): What are you going to discuss during your session at NoSQL Now! 2014, and what will the audience gain from attending your talk?
Bob DuCharme (BD): People typically define Big Data applications as those that can handle the increasing amount of Volume, Velocity, and Variety of data available today, and a recent Gartner poll showed that people have the most trouble with the Variety aspect. While the choice between relational and NoSQL databases often means an all-or-nothing commitment to the use of database schemas, I’m going to show how W3C standards let you use partial schemas that make it easier to flexibly take subsets of different data sets and use them together so that variety is no longer such an impediment to Big Data applications.
DV: What is really important about such a topic in terms of the current state of Data Management and / or how the industry going to transform moving into the future?
BD: More and more people see the potential value of combining different data sets into a whole that is greater than the sum of its parts, revealing patterns that were not apparent before. The classic difficulties of data integration, which have been around for decades, have made this so much work that quick integrations to address what-if scenarios were just not a realistic option. Now that they are, people can get more value from their own data, from public data, and from any combinations that they want to assemble.
DV: Please tell us a little about yourself and your history in the industry, past work experience, and how you got started in the data profession?
BD: I’ve always enjoyed working with data that doesn’t fit neatly into tables. As a tech writer at a software company who helped to automate the conversion of documentation for delivery as online help under multiple operating systems, I became interested in SGML, and then when some key SGML figures created the simplified version known as XML, I did a lot of work (and wrote a few books) with that technology. The appeal of the flexibility of RDF and the SPARQL query language—both W3C standards, like XML—led me to TopQuadrant, a leader in the use of these technologies to address customer data integration issues. Since then, I wrote the book “Learning SPARQL” for O’Reilly publications, complete with its own animal on the cover: the deep-sea anglerfish.
DV: What is the biggest challenge happening in your particular area of Data Management at this time?
BD: Some people forget that SQL was considered theoretically nice but practically inefficient when it was first proposed. Since then, many people and organizations have contributed to make relational databases much more efficient. RDF-based technology has had to address similar challenges of scalability and efficiency, and while it’s great to see the progress being made both on the commercial and academic side, I look forward to seeing what kinds of things will be made possible by the additional progress that they’re headed for.
DV: How is such a change influencing your job?
BD: It’s hard to believe that the original billion triples challenge was six years ago, and as it becomes possible to store, query, and build applications around larger numbers of triples, it opens up new possibilities for what kinds of customer needs TopQuadrant can address.
DV: How have your job, and / or the work you are doing at your organization, altered in the past 12 months? How do you expect it will change in the next 1-2 years?
Answer: The key change over time has been the new industries that realize the value that RDF technology can bring to their challenges. It happened with pharma a while ago, then oil and gas, and now we’re seeing financial and insurance firms realize this as well, so this opens up the work we’re doing into interesting new areas.
DV: Are there any other emerging technologies you predict will affect your job function in the future?
BD: The Internet of Things means more devices are generating more kinds of data, so there are more data sets to integrate to find interesting new patterns. I look forward to getting involved with that.
DV: What’s your favorite “Data” or “Data Management” quote?
BD: I’ve always liked this quote attributed to the pre-Socratic Greek philosopher Heraclitus of Ephesus: “A wonderful harmony is created when we join together the seemingly unconnected.”
DV: How do you explain what you do for work, at a cocktail party, or to your grandparents?
BD: I work for a software company that makes it easier to take advantage of large amounts of data that don’t fit neatly into tables, which is what most database management programs expect. The ways that we do it are all based on standards from the same organization that gave us the Web, making it easier to connect up lots of data from around the world.
If you are interested in attending Bob DuCharme’s session at NoSQL Now!, please see the conference schedule at: http://nosql2014.dataversity.net/agenda.cfm?confid=81&scheduleDay=PRINT
His session is on Wednesday, August 20th at 10.15am.
About NoSQL Now!:
NoSQL Now! is an educational conference and exhibit focused on the emerging field of NoSQL technologies. NoSQL (Not Only SQL) refers to the new breed of databases that are not based on the traditional relational database model, including document stores, key value stores, columnar databases, XML databases and graph databases. The NoSQL Now! Conference is designed to educate developers, data managers and architects on how these new technologies work, the applications they are best suited for, and how to deploy them. Additional details are available at http://nosql2014.dataversity.net/index.cfm.