If you would like your company to be considered for an interview please email editor[ at ]semanticweb[ dot ]com.
In this segment of our “Innovation Spotlight” we spoke with Andreas Blumauer, the CEO of Semantic Web Company. Semantic Web Company is headquartered in Vienna, Austria and their software extracts meaning from big data using linked data technologies. In this interview Andreas describes some of the their core products to us in more detail.
Sean: Hi Andreas. Can you give us a little background on your company? When did you get started in the Semantic Web?
Andreas: As an offspring of a ‘typical’ web agency from the early days of the internet, we became a specialized provider in 2004: The ‘Semantic Web School’ focused on research, consulting and training in the area of the semantic web. We learned quickly how the idea of a ‘semantic web’ was able to trigger a lot of great project visions but also, that most of the tools from the early days of the semantic web were rather scary for enterprises. In 2007 we experienced that information professionals began to search for grown-up semantic web solutions to improve their information infrastructure. We were excited that ‘our’ main topics obviously began to play a role in the development of IT-strategies in many organizations. We refocused on the development of software and renamed our company.
Meanwhile we are a semantic technology provider and employs 20+ people. We support global 500 companies, international NGOs and government bodies with the implementation of information management systems based on semantic technologies and linked data.
Sean: Your product is called “PoolParty”. What does this product offer to your clients?
Andreas: PoolParty Suite was developed to combine a humans' granular expertise of a professional domain and the knowledge about relations between entities with the capabilities of machines to process large amounts of data.
Today PoolParty enables enterprises to create, integrate and manage sophisticated and scalable controlled vocabularies about their own domain (e.g. pharma, renewable energy) to extract meaning and relations from their proprietary structured and unstructured information sources based on these vocabularies, and to link enterprise data to make it accessible and browsable on one interface finally.
To keep barriers low, PoolParty is shipped with SKOSsy, a tool to extract knowledge models from linked data automatically. This turned out to be an important step, since many organizations don’t have mature vocabularies in place to start with. Recently we also added services like a linked data based synonym and translation service.
PoolParty software is Java based and follows strictly W3C Semantic Web standards like SKOS and SPARQL, it is language-agnostic and comes with a GUI & API for consuming, publishing & editing linked data. PoolParty APIs are integrated with existing enterprise platforms like Microsoft Sharepoint, WordPress, Drupal or Atlassian Confluence.
Sean: What does your company consider “Big Data”?
Andreas: Generally spoken, if there were no Big Data, there would be no market for PoolParty. Organizational (Big) Data is the haystack and with PoolParty we are helping our customers to find the needle (and to link it with other needles).
Our perspective on Big Data is very much focused on the question how to extract meaning from distributed and heterogeneous data sources: Since graph databases have started to play a more important role also in the NoSQL world we think, that the Big Data community will identify linked data technologies very soon as a key technology.
Sean: What types of applications do your clients use your products for in the “Big Data” space.
Andreas: The application scenarios are broad but what PoolParty clients like Roche, Wolters Kluwer, Pearson, Credit Suisse, REEEP, World Bank, etc. have all in common is the need for integrated views on relevant information. PoolParty supports this goal by:
- creating vocabularies out of structured Big Data sources (semi-)automatically (DBpedia, Freebase, etc.)
- integrating dispersed data sources in one data store and on one interface to enable categorized and faceted search based on the customer’s vocabularies
- enriching domain knowledge models (e.g. taxonomies or thesauri) with synonyms, geo data, images, and further relations interlinking various vocabularies to build a ‘network of knowledge models’
- recommending related content from various internal or external sources like similar documents, related datasets, pictures, etc.
- classifying content in proprietary DMS or CMS based on metadata derived from controlled vocabularies
- integrating large internal or externals datasets for information mash-ups
- providing RDF interfaces for the re-use of data
Sean: It seems Linked Data and Big Data fit together nicely. The structure of Linked Data naturally works well with large datasets. What is a recent example of a problem your software has solved where both were involved?
Andreas: In the area of renewable energy a lot of data and information sources exist but it is hard to get an overview over all actors, events, policies etc. As an example for a linked data application reegle offers country profiles which provide exactly such a mashup per country: Data from the Worldbank, UN or Eurostat were linked to internal databases automatically. In addition, the reegle API offers means to annotate and enrich domain specific content with entities from the LOD cloud. As a result content flows become more and more personalised: Finally the ‘power of push’ becomes true and experts will benefit from ‘searchless finding’.
Sean: What problem does your “Linked Data Manager” solve?
Andreas: The “Linked Data Manager” is used to schedule and monitor ETL jobs (Extract - Transform - Load) for smooth and efficient Linked (Open) Data Management. It works for web-based data portals (LOD platforms) as well as for data management and data integration use cases behind the firewall. The tool provides mechanisms for data extraction and transformation, data linking and data publication based on RDF. Also sources which are not RDF originally can be integrated with this tool.
Sean: What changes do you see coming for Linked Data? Are there any new products you are working on that you can discuss?
Andreas: Linked Data will become one of the major topics for information managers: Caused by the massive growth of data, conventional methods of data integration will fail and the complexity of processes within organizations ask for more agile options to canalize and normalize data in a qualified way. Availability and matching of diverse data sources become more crucial and therefore the need for standards-based tools for information management is growing.
Many of our customers have realized this already and take advantage of the Linked Data approach. In future there will be more and more strategic partnerships between organizations to exchange and re-use high quality data and all of us benefit from more and more published linked (open) data.
But some issues are still on the roadmap and show different levels of maturity: high availability of datasets, quality of datasets, licensing models, adaptive user interfaces, personalisation and last but not least business models.
At the moment we are in the process of integrating our linked data core technologies more and more with existing enterprise platforms like Sharepoint, Confluence following the principle of ‘searchless finding’.
Sean: What types of research does your company do?
Andreas: Our research activities are mainly focused on industrial relevant scenarios, such as the LOD2 project. The project aims to contribute high-quality interlinked versions of public semantic web data sets and promoting their use in new cross-domain applications by developers across the globe. The new technologies for enabling scalable management of Linked Data collections will raise the state of the art of semantic web data management, both commercial and open-source, providing opportunities for new products and spin-offs, and make RDF a viable choice for organizations worldwide as a premier data management format.
Within the LOD2-project PoolParty is part of the technology stack and we developed the “Linked Data Manager” for data cleansing, linking and fusing to help creating and bootstrapping new data sets. The overall goal of the project is to make Linked Data the model of choice for next-generation IT systems and applications.
Apart from LOD2 we are involved in other EU-projects like SEMAGROW (together with FAO among others) which are focused on linked (open) data and recommendation systems.
Sean: You will be attending the Semantic Technology & Business Conference in the UK this year. What will you be presenting on?
Andreas: Yes, we are looking forward to being in London at SemTechBiz and hope to have productive conversations at our booth. We will present PoolParty based solutions for Sharepoint and Confluence and the latest release of our text extraction component which combines conventional text mining methods with linked data technologies.
Sean: Is there a place where developers can try out your APIs on a trial basis?
Andreas: Yes, there are several ways to join the PoolParty. To get a very first impression of applications based on PoolParty, we provide demos of PoolParty Search (PPS) and PoolParty Extractor (PPX). Knowledge engineers and developers might be interested to create their own models and to learn about the look and feel, the SPARQL endpoint or the linked data frontend of PoolParty: Therefore we provide demo-accounts of PoolParty Thesaurus Manager (PPT). To get started with the Extractor API in the context of renewable energy for example, go to the reegle API and give it a try.
Sean: Thanks for your time Andreas! We look forward to seeing you at Semtech London!
About the Author:
Sean Golliher (@seangolliher) is an adjunct professor of search engines and social networks at MSU and is a member of their computer science advisory board. He is also the founder and publisher of SEMJ.org. Sean holds four engineering patents, has a B.S. in physics from the University of Washington in Seattle, and a master’s in electrical engineering from Washington State University. He is also president and director of search marketing at Future Farm, Inc., Bozeman MT, where he focuses on search marketing, internet research, and consults for large companies. He has appeared and been interviewed on well-known blogs and radio stations such as Clickz.com, Webmasterradio.com, and SEM Synergy. To maintain a competitive edge he reads search patents, papers, and attends search marketing conferences on a regular basis.