2015
Contribution to book  Restricted

Clustering formulation using constraint optimization

Grossi V., Monreale A., Nanni M., Pedreschi D., Turini F.

Information Search and Retrieval  Clustering  Constraint Programming  Optimization  Constrained optimization 

The problem of clustering a set of data is a textbook machine learning problem, but at the same time, at heart, a typical optimization problem. Given an objective function, such as minimizing the intra-cluster distances or maximizing the inter-cluster distances, the task is to find an assignment of data points to clusters that achieves this objective. In this paper, we present a constraint programming model for a centroid based clustering and one for a density based clustering. In particular, as a key contribution, we show how the expressivity introduced by the formulation of the problem by constraint programming makes the standard problem easy to be extended with other constraints that permit to generate interesting variants of the problem. We show this important aspect in two different ways: first, we show how the formulation of the density-based clustering by constraint programming makes it very similar to the label propagation problem and then, we propose a variant of the standard label propagation approach.

Source: Software Engineering and Formal Methods, edited by Domenico Bianculli, Radu Calinescu, Bernhard Rumpe, pp. 93–107, 2015


1. B. Babaki, T. Guns, and S. Nijssen. Constrained clustering using column generation. In CPAIOR 2014, pages 438{454, 2014.
2. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning distance functions using equivalence relations. In ICML, pages 11{18, 2003.
3. S. Basu, A. Banerjee, and R. J. Mooney. Active semi-supervision for pairwise constrained clustering. In SDM, 2004.
4. S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semisupervised clustering. In KDD, pages 59{68, 2004.
5. M. R. Berthold, C. Borgelt, F. Hppner, and F. Klawonn. Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data. Springer Publishing Company, Incorporated, 1st edition, 2010.
6. J. C. Bezdek. Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, 1981.
7. M. Bilenko, S. Basu, and R. J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In ICML. ACM, 2004.
8. M. Coscia, F. Giannotti, and D. Pedreschi. A classi cation for community discovery methods in complex networks. Statistical Analysis and Data Mining, 4(5):512{546, 2011.
9. T.-B.-H. Dao, K.-C. Duong, and C. Vrain. A declarative framework for constrained clustering. In ECML/PKDD (3), pages 419{434, 2013.
10. I. Davidson and S. S. Ravi. Clustering with constraints: Feasibility issues and the k-means algorithm. In SDM, 2005.
11. I. Davidson and S. S. Ravi. Identifying and generating easy sets of constraints for clustering. In Proceedings, The Twenty-First National Conference on Arti cial Intelligence and the Eighteenth Innovative Applications of Arti cial Intelligence Conference (AAAI), pages 336{341, 2006.
12. I. Davidson and S. S. Ravi. The complexity of non-hierarchical clustering with instance and cluster level constraints. DMKD, 14(1):25{61, 2007.
13. I. Davidson and S. S. Ravi. Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min. Knowl. Discov., 18(2):257{282, 2009.
14. J. C. Dunn. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. 1973.
15. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In E. Simoudis, J. Han, and U. M. Fayyad, editors, KDD, pages 226{231. AAAI Press, 1996.
16. T. Guns, S. Nijssen, and L. D. Raedt. k-pattern set mining under constraints. IEEE Trans. Knowl. Data Eng., 25(2):402{418, 2013.
17. P. Hansen and D. Aloise. A survey on exact methods for minimum sum-of-squares clustering. http://www.math.iit.edu/Buck65 les/msscStLouis.pdf, pages 1{2, Jan. 2009.
18. J. A. Hartigan and M. A. Wong. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100{108, 1979.
19. O. D. Merle, P. Hansen, B. Jaumard, and N. M. . An interior point algorithm for minimum sum of squares clustering. SIAM J. Sci. Comput, 21:1485{1505, 1997.
20. M. Mueller and S. Kramer. Integer linear programming models for constrained clustering. In Proceedings of the 13th International Conference on Discovery Science, DS'10, pages 159{173, 2010.
21. B. Negrevergne and T. Guns. Constraint-based sequence mining using constraint programming. In CPAIOR 2015, pages 288{305, 2015.
22. M. Okabe and S. Yamada. Clustering by learning constraints priorities. In ICDM, pages 1050{1055, 2012.
23. U. N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(2):036106+, 2007.
24. C. Ruiz, M. Spiliopoulou, and E. Menasalvas. C-dbscan: Density-based clustering with constraints. In Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, pages 216{223, 2007.
25. K. Wagsta and C. Cardie. Clustering with instance-level constraints. In ICML, pages 1103{1110, 2000.
26. K. Wagsta and C. Cardie. Clustering with instance-level constraints. In AAAI/IAAI, page 1097, 2000.
27. K. Wagsta , C. Cardie, S. Rogers, and S. Schrodl. Constrained k-means clustering with background knowledge. In ICML, pages 577{584, 2001.
28. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems 15, pages 505{512. MIT Press, 2002.

Metrics



Back to previous page
BibTeX entry
@inbook{oai:it.cnr:prodotti:347580,
	title = {Clustering formulation using constraint optimization},
	author = {Grossi V. and Monreale A. and Nanni M. and Pedreschi D. and Turini F.},
	doi = {10.1007/978-3-662-49224-6_9},
	booktitle = {Software Engineering and Formal Methods, edited by Domenico Bianculli, Radu Calinescu, Bernhard Rumpe, pp. 93–107, 2015},
	year = {2015}
}

ICON
Inductive Constraint Programming


OpenAIRE