2004
Conference article  Restricted

WebDocs: a real-life huge transactional dataset

Lucchese C., Orlando S., Perego R., Silvestri F.

Frequent itemsets mining datasets 

This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.

Source: ICDM Workshop on Frequent Itemset Mining Implementations, pp. 2–2, Brighton, UK, 1 November 2004

Publisher: CEUR-WS.org, Aachen, DEU



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:91780,
	title = {WebDocs: a real-life huge transactional dataset},
	author = {Lucchese C. and Orlando S. and Perego R. and Silvestri F.},
	publisher = {CEUR-WS.org, Aachen, DEU},
	booktitle = {ICDM Workshop on Frequent Itemset Mining Implementations, pp. 2–2, Brighton, UK, 1 November 2004},
	year = {2004}
}