Lucchese C., Orlando S., Perego R.
Database Management. Data mining Pattern mining
The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. In this paper we formalize the problem of discovering the Top-K patterns from binary datasets in presence of noise, as the minimization of a novel cost function. According to the Minimum Description Length principle, the proposed cost function favors succinct pattern sets that may approximately describe the input data. We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and realworld data.
Source: Tenth SIAM International Conference on Data Mining, pp. 165–176, Columbus, Ohio, US, April 29 - May 1 2010
Publisher: SIAM Publications, Philadelphia, USA
@inproceedings{oai:it.cnr:prodotti:92091, title = {Mining top-K patterns from binary datasets in presence of noise}, author = {Lucchese C. and Orlando S. and Perego R.}, publisher = {SIAM Publications, Philadelphia, USA}, booktitle = {Tenth SIAM International Conference on Data Mining, pp. 165–176, Columbus, Ohio, US, April 29 - May 1 2010}, year = {2010} }