Loading...
You are here:  Home  >  'Common Crawl'
Latest

Web Data Commons Project Releases New Dataset

By   /  December 11, 2012  /  Data Daily | Data News, Data Education, Smart Data News, Smart Data News, Articles, & Education  /  No Comments

Christian Bizer and Robert Meusel of the Web Data Commons project today announced the release of a new WebDataCommons dataset.

Read More →
Latest

Common Crawl Announces Winners of Code Contest

By   /  October 8, 2012  /  Data Education, Smart Data News, Articles, & Education  /  No Comments

Common Crawl has announced the winners of their first ever Common Crawl Code Contest.

Read More →
Latest

New Common Crawl Video and Contest Details

By   /  July 25, 2012  /  Data Education, Smart Data News, Articles, & Education  /  No Comments

Common Crawl is back in the news after releasing a new video about the organization.

Read More →
Latest

Common Crawl Corpus Update Makes Web Crawl Data More Efficient, Approachable For Users To Explore

By   /  July 16, 2012  /  Big Data News, Articles, & Education, Data Blogs | Information From Enterprise Leaders, Data Education, Smart Data News, Articles, & Education  /  No Comments

Common Crawl now is providing its 2012 corpus of web crawl data not just as .ARC files, but also is releasing the metadata files (JSON-based metadata with all the links from every page crawled, metatags, headers and so on) as well as text output. Semantic web projects that use its corpus include the work of […]

Read More →
Latest

Common Crawl To Add New Data In Amazon Web Services Bucket

By   /  March 13, 2012  /  Big Data News, Articles, & Education, Data Blogs | Information From Enterprise Leaders  /  No Comments

The Common Crawl Foundation is on the verge of adding to its Amazon Web Services (AWS) Public Data Set of openly and freely accessible web crawl data. It was back in January that Common Crawl announced the debut of its corpus on AWS (see our story here). Now, a billion new web sites are in […]

Read More →