Say you’ve developed a service based on semantic web technologies that you know could have a great future – if only you had a cost-effective way to scale it.
Maybe now you do. At the DEMOfall 09 this week, web content crawling and processing service 80legs is officially getting standing up with its formal launch.
Based on a grid computing model, the service puts the power of some 50,000 compute nodes in the hands of companies that need an on-demand way to crawl the web to deliver their services to users. That could be any type of company – media analysis firms doing IP monitoring services, such as looking for pirate videos, market researchers or anyone doing competitive intelligence work – but semantic web players fit into the mix well, says 80legs CEO Shion Deysarkar.
In fact, one of the early partners for the service has been Swingly, a semantic search service in alpha that extracts factual knowledge from the text of any document found on the web for purposes such as sentiment analysis.
“On one end of the semantic web you have structured data markup like RDF,” says Deysarkar. “But on the other end of the spectrum is making sense of unstructured content, applying meaning and natural language processing to unstructured data. Even though you might have cool semantic sentiment analysis, applying that to a lot of content and showing it can work across the web is difficult. There are scale and cost issues.”
The service from 80legs provides that scalability, letting semantic web start-ups and others set up customized crawls and run their applications across tens of thousands of computers. Deysarkar says its grid model is completely on-demand, which he says isn’t always the case with cloud infrastructures where it’s more of a rent-a-data-center situation that requires users to set up virtual machines, configure instances and write their own crawlers.
“With 80legs you just write your own applications, upload them, and use a simple API to kick up the job,” he says.
He also contends that the maximum number of pages businesses subscribing to the cloud model can crawl in a day is 100 million, compared to 2 billion with 80legs. In addition to performance he also says 80legs has a cost advantage.
“With cloud if you look at crawling 1 million pages, it’s probably about $4 per page,” he says. “80legs charges $2 per million pages crawled, or on a CPU hourly basis it’s 3 cents for 80legs vs. 10 cents for Amazon.”
Deysarkar says 80legs uses Amazon’s cloud service as a benchmark, and has found that service’s business model and capabilities pretty similar to other cloud computing providers.
The most active users of 80legs so far have been the semantic start-ups like Swingly, which crawls about 10 million pages every hour. Swingly will be using the service to provide information on sentiment around the DEMOFall conference based on content posted to the web, Deysarkar says.
“We’re pretty excited about this because we know a lot of semantic companies have had a hard time getting their technology out there,” he says. “For example, Powerset got a lot of buzz, but it only ran on Wikipedia to start out with. We’re hoping now if you launch a semantic company you can use 80legs to run your technology on the entire web and prove out the technology. To become a more interesting space the semantic web has to start showing results across a wide spectrum of use cases.”
80legs, which had seed funding through Creeris Ventures and its incubation model, thinks what it’s doing can help with the money crunch some semantic web and other startups that need web crawling capabilities might be experiencing as they try to seek backing for their enterprises.
Among the early adopters Deysarkar says there’s been great enthusiasm about not facing the daunting task of investing in internal data centers or going the cloud route that isn’t scalable enough for large-scale web crawling.
“They can scale out and not have to raise a million dollars anymore-or if they have raised that money now they don’t have to spend all of it,” he says. “They can focus on their core competencies instead.”
The next move for 80legs will be launching an Apps store in the next couple of months to give its customers new distribution models for their applications, So sem web developers can use the service to not just run their applications, but sell them too.