Breaking into the Semantic Web, Part II

By   /  July 23, 2008  /  No Comments

This interview with Eric Miller, President of Zepheira was conducted by Golda Velez.

SR: Eric, let me ask your advice. The Semantic Web is interesting, exciting, promising. So say I’m a developer, how do I get involved with it? Or suppose I have a tech company, how do I get work in this field?

This interview with Eric Miller, President of Zepheira was conducted by Golda Velez.

SR: Eric, let me ask your advice. The Semantic Web is interesting, exciting, promising. So say I’m a developer, how do I get involved with it? Or suppose I have a tech company, how do I get work in this field?

Miller: Open source projects. You mentioned a set of different kinds of users who might take different approaches for breaking into the semantic web. From a developer standpoint, a hotbed of activity, specifically open source activity is an extraordinarily valuable way of breaking in. You get instant familiarity, warts and all, with the issues associated with the tools and toolkits, but you also get in essence access to a very large network of people, and more importantly you build up a reputation that becomes a very strategic target for positioning yourself in a variety of new open opportunities that are coming online.

SR: Lets get specific about which open source project you’d consider the hottest…

Miller: Gosh, certainly from the interface side of this, the MIT Simile project is just extraordinarily a very attractive one. And I say this not because of my involvement in it, but I should disclose that I am biased, but it just is really extraordinarily applicable to so many different domains, and people ‘Get it’. They see it, see the value of it instantly and we as humans by our very nature resonate with things they can see.

SR: Is SIMILE open to the community? I thought they were working on it primarily at MIT?

Miller: It has transitioned from the internal team that had a specific focus on doing, but we’ve now moved the various components to Google code base, and have opened it up to more and more developers. So its a wonderful opportunity for developers to come on line and contribute to that very excellent bit of code that has tremendous applicability to lots and lots of applications.

So from a front-end users interface standpoint I think that would be very much the sort of direction that developers who are interested in the front end UI stuff that would be my first choice.

If you’re looking for the back-end side of this, if your focus is on scalable architecture, back-end infrastructure, federated architectures, I would say Mulgara is the place to look. So again, I’m a bit biased in this, but Mulgara is an open source offshoot of a company called Toucana, which was one of the first commercial companies in the semantic web space. They were purchased by Northrup Grumman, but Mulgara is the open source infrastructure which drove Toucana. And what’s happened recently at this conference, that I’m pleased to announce is that Aduna and Zepheira have supported the development of Sesame and Mulgara integration. So Sesame, which has a very rich API, now has a back end interface to the Mulgara storage system. There’s lots of work to be done still

SR: I thought Sesame was a storage system?

Miller: It has its own storage system, absolutely, but it was very limited.

SR: Is Sesame the same sort of thing as Jena?

Miller: Its the same kind of thing as Jena. We’ve found they have different sort of design philosophies, they have different sort of modularities, capabilities.

SR: Very briefly, can you tell me what those differences are between Sesame and Jena?

Miller: Well, different applications always have different philosophies and designs. One of the things that made it quite attractive when considering an integration, is that Sesame’s client is a nice clean interface to drop in other back-end systems. So they actually did a very good job separating the client from the storage system. There’s other kinds of JDBC interfaces, but the SAIL abstraction has been a very attractive one for providing a very clean plug point for Mulgara integration. So whether you’re interested in the API, the back-end storage system, from a developer’s standpoint, those two communities are just a little bit more receptive to outside people digging their hands in the code, adding patches, modifying things, everything else like that.

Not to belittle by any means HP’s excellent work in this case, but breaking into an open source community is easier when the barriers for that kind of participation is reduced. Jena’s just are slightly higher. Both are valuable, both are extremely popular toolkits, but if I’m a developer, looking for low entry barrier and high value propositions are an important combination to consider.

SR: Now, what if I want to make vocabularies? Is there a development community for that?

Miller: I actually think if you are interested in making vocabularies perhaps developers are the last people you want to talk to..(laughter, agreement…) You know, so talk to the business people, talk to people. That’s sort of the funny thing about vocabularies, when you don’t talk to people, its not always useful.

There’s a variety of different forums where those vocabulary discussions are taking place now. The Dublin Core metadata initiative is a fine place where people are constantly talking about creating vocabularies for x y or z.

Dublin core is always talking about it. They’ve been talking about it for more than 12 years. They are the consistent place where people have been talking about vocabularies for the web. W3C – if you’re focused on specific domains, like healthcare life sciences, the W3C life sciences group, or the e-government group, are useful.

SR: They tend to get a little bit more ‘heavy’ sometimes…

Miller: Yes, and this would be a plea to bring more people into this group.

SR: So if I’m a developer, I could serve as a bridge between the non-technical people and the developers in the vocabulary group.

Miller: Exactly. And that’s the key aspect from the developer standpoint for the semantic web, becoming a conduit between the business and the code. Not even the business, but the people who need the terms, the vocabularies, a conduit its a very important role to play. You become a very important bridge between those two communities. And both are needed (the developer and user communities), but what’s needed even more are bridges.

SR: Talking about that, lets talk from the technical company point of view.

I’ve got a company, so talked to a fairly large client of mine about organizing their data using semantic technology. The client said no, we don’t need to do that, we have other problems. So that’s the question now, suppose I’ve got my programmers up to speed, we’re excited about the technology, but how do we get someone to pay for it?

Miller: I think that’s one of the things we need to always remember = its important to ground all these ideas in the value proposition. We can talk about how wonderful open data is, semantic technologies are, but if we don’t solve a very specific business problem or meet a very specific business need, that business isn’t going to buy in. you know, why should I be doing that instead of something else. And those are generally very easy questions to answer, but its very important to answer them. You can’t expect the business to see the vision in your head and get the value proposition. So this is just a reminder to folks, to get excited about this, get excited about all these different standards and development. but if you want to get into it from the business end, its important to ground it in things that really resonate with them.

Part of this is focusing on the simple things first and the complex things later. The big wins have to be preceded with small ones. You don’t need to boil the ocean. I can keep going on with more and more cliches, but I think you get the message.

At a certain level small wins happen, there’s a tremendous amount of data that companies already have. This notion that for the semantic web to happen everyone is going to have to tag things, add metadata to things, is incredibly – well, just wrong. I mean, there is data inside of databases, there is data in side of excel spreadsheets, there’s data inside of LDAP directories. There’s lots of legacy data. Part of where the small wins can occur is to just empower people to stitch those things together.

The keynote that I gave this morning on remix describes how you can combine best of breed components, some of which I just described earlier, to be really really cost effective and easy for folks to just stitch this together. And not buy proprietary solutions, but rely on open source and open standards.

SR: I’ve seen a lot of people doing proprietary applications in the semantic space.

Miller: I think we have to recognize that the value proposition here is number one, the ability to integrate that data, and number two, integrate it in such a way that its not locked up in yet another proprietary application. Its your data. Take it back. Stop outsourcing your data integration to somebody else’s proprietary solution. And that’s the reason I think, in part, the semantic web is becoming so successful. People are starting to realize that.

Its not necessarily that you’re getting 10 times more performance from RDF than some sort of proprietary solution. Its rather that you can stitch together best-of-breed suite together, and you own that data. You have data portability and can take it accross different systems. You can stitch together best of breed components and solve your particular problem.

SR: So you don’t necessarily have to be totally object oriented…

Miller: No, actually its a little bit better if you’re not! But what you do have to be is relationship oriented. Its important to recognize the value proposition for a lot of this is in the relationships between the objects in your business. The more connected data you have inside a company, the more successful you are in terms of new markets, understanding your current patterns, and adapting to change.

SR: Now in terms of industries, we have financial, pharmeceutical, health care, government. I’ve seen Brand Neiman’s name at the federal government level.

Miller: e-government, intel certainly. Brand Neiman is one example, but I’d suggest more global scope in terms of this, a tremendous amount of the e-government work is in Europe. That’s where we’re seeing a tremendous amount of semantic web penetration right now.

SR: Are they going to contract out over here..?

Miller: Sure they will – the euro is very strong right now (laugther). These are world wide web standards, these are world wide problems. You mentioned health care, pharmeceutical, also biotech, but also there’s a tremendous amount of work we’re seeing just inside of the enterprise. In content delivery, publishing, media. How do you deliver the right content to the right person. Not the classic person that might be K-12. I can assure you that the information needs of a kindergartener and the information needs of a 12 year old are very very different.

We don’t have to design any more systems based that sort people into large logical buckets, but we can deliver the content that people need based on their individual preferences.

SR: Is that what Remix is about?

Miller: Remix is basically about lowering the cost to you as an individual to integrate the data that makes sense to you in a way that makes sense to you.

What it is is a best of breed integration and combination of a variety of different open source tools and infrastructure that we have been involved in, that we’ve brought together to solve a particular pattern of problems that we see over and over.

SR: Thanks, Eric. That is a huge amount of valuable information for folks out there who want to get involved in the Semantic Web community.

You might also like...

GDPR Compliance: A Data Transformation Opportunity

Read More →