Clark & Parsia’s Stardog lightweight RDF database is moving into release candidate 1.0 mode just in time for next week’s upcoming Semantic Technology & Business Conference in San Francisco next week. The product’s been stable and useable for awhile now, but a 1.0 nomenclature still carries weight with a good number of IT buyers.
The focus for the product, says cofounder and managing principal Kendall Clark, is to be optimized for what he says is the fat part of the market – and that’s not the part that is dealing with a trillion RDF triples. “Most people and organizations don’t need to scale to trillions of anything,” though scaling up, and up, and up, is where most of Clark & Parsia’s competitors have focused their attention, he says. “We’ve seen a significant percentage of what people are doing with semantic technology and most applications are not at a billion triples today.” Take as an example Clark & Parsia’s customer, NASA, which built an expertise location system based on semantic technology that today is still not more than 20 million triples. “You might say that’s a little toy but not if you are at NASA and need defined experts, it is a real, valuable thing and we see this all the time,” he says.
The database vendors in pursuit of the smaller slice of the potential customer pie, where scaling up big-time does matter, aren’t concentrating on the scaling-down features that do matter to customers with more circumscribed yet strategic needs. “It will answer your queries but not in an optimal fashion for your data, because they made engineering choices to make it possible to load a billion triples, he says. And maybe it won’t run on your iPad, either.
How is Stardog equipped to serve the fatter part of the market, where Clark says lots of interesting use cases and value lies? “We focused kind of obsessively not about big scale but about raw query execution speed,” he says. Long-running OLAP queries are the sweet spot for semantic technologies, he says, and Stardog honed in on that, turning in performance on the SP2B benchmarks that Clark notes are still the best of any RDF database. OLTP was next up as a focus – Clark & Parsia isn’t formally reporting numbers on the BSBM benchmark but Clark says numbers there look good around how much query volume it can handle.
“We just have a different goal than trying to load a trillion triples or address the most data, and that informs every choice you make in how you design a system,” he says. “We did everything we did because we wanted to be fast.”
Among other capabilities for a feature-rich 1.0 release is that the pure enterprise-grade Java solution is very simple to administer, install and manage, he says. And, there is no reasoning it won’t do, from RDFS all the way to the highest level of OWL 2. Additionally, it includes “some unique features for data integrity and data quality,” he notes. Users can use any semantic web language to describe constraints for data, and if they choose to enable this capability, Stardog will not let them write invalid data into the database.
“So you can’t make data errors because the system won’t let data be written that’s not valid according to the rules you set up,” Clark says. As one example: Users can take a published ontology like FOAF, give it to Stardog, and tell it to use that ontology for the people data part of the database, and not to let anyone data into the database that is not valid for this schema.
For the next development cycle, big data is in play with Hadoop integration on the roadmap. “That’s exactly because some problems need to be solved with something like Hadoop, but in either case, what has to happen first is data integration,” Clark says, where semantic technology is a perfect fit. “You can’t analyze what happens in an organization or your customer base until you get your hands around all the relevant data in a cheap enough way.”