2006
Conference article  Restricted

Mining frequent closed itemsets out-of-core

Lucchese C., Orlando S., Perego R.

Frequent itemsets mining  Out of core algorithms 

Extracting frequent itemsets is an important task in many data mining applications. When data are very large, it becomes mandatory to perform the mining task by using an external memory algorithm, but only a few of these algorithms have been proposed so far. Since also the result set of all the frequent itemsets is likely to be undesirably large, condensed representations, such as closed itemsets, have recently gained a lot of attention. In this paper we discuss the limitations of the partitioning techniques adopted by external memory algorithms for extracting all the frequent itemsets, when applied to closed itemsets mining. The main issue is that the closedness of an itemset cannot be evaluated only using the local knowledge available in a single partition of the input dataset. A further step is thus needed to correctly merge the partial results. We introduce the first algorithm for mining closed itemsets out of core. The algorithm exploits a divide-et-impera approach, where the input dataset is split into smaller partitions, such that not only they can be loaded, but also they can be mined entirely into the main memory. Moreover, we devised a simple technique based on a new theoretical result that allows us to reduce the problem of merging partial solutions to an external memory sorting problem.

Source: SIAM International Conference on Data Mining, pp. 417–427, Bethesda, Maryland, 20-22/04/2006



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:91344,
	title = {Mining frequent closed itemsets out-of-core},
	author = {Lucchese C. and Orlando S. and Perego R.},
	booktitle = {SIAM International Conference on Data Mining, pp. 417–427, Bethesda, Maryland, 20-22/04/2006},
	year = {2006}
}
CNR ExploRA

Bibliographic record

Also available from

www.siam.orgRestricted