2010
Conference article  Restricted

Mining top-K patterns from binary datasets in presence of noise

Lucchese C., Orlando S., Perego R.

Database Management. Data mining  Pattern mining 

The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. In this paper we formalize the problem of discovering the Top-K patterns from binary datasets in presence of noise, as the minimization of a novel cost function. According to the Minimum Description Length principle, the proposed cost function favors succinct pattern sets that may approximately describe the input data. We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and realworld data.

Source: Tenth SIAM International Conference on Data Mining, pp. 165–176, Columbus, Ohio, US, April 29 - May 1 2010

Publisher: SIAM Publications, Philadelphia, USA



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:92091,
	title = {Mining top-K patterns from binary datasets in presence of noise},
	author = {Lucchese C. and Orlando S. and Perego R.},
	publisher = {SIAM Publications, Philadelphia, USA},
	booktitle = {Tenth SIAM International Conference on Data Mining, pp. 165–176, Columbus, Ohio, US, April 29 - May 1 2010},
	year = {2010}
}
CNR ExploRA

Bibliographic record

Also available from

www.siam.orgRestricted