Lucchese C., Orlando S., Perego R., Silvestri F.
Frequent itemsets mining datasets
This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.
Source: ICDM Workshop on Frequent Itemset Mining Implementations, pp. 2–2, Brighton, UK, 1 November 2004
Publisher: CEUR-WS.org, Aachen, DEU
@inproceedings{oai:it.cnr:prodotti:91780, title = {WebDocs: a real-life huge transactional dataset}, author = {Lucchese C. and Orlando S. and Perego R. and Silvestri F.}, publisher = {CEUR-WS.org, Aachen, DEU}, booktitle = {ICDM Workshop on Frequent Itemset Mining Implementations, pp. 2–2, Brighton, UK, 1 November 2004}, year = {2004} }