2017
Conference article  Open Access

Efficiently clustering very large attributed graphs

Baroni A., Conte A., Patrignani M., Ruggieri S.

Computer Science - Social and Information Networks  Computer Science - Discrete Mathematics  FOS: Computer and information sciences  Computer Networks and Communication  large attributed graphs  Physics - Physics and Society  Social and Information Networks (cs.SI)  Information Systems  semantic-topological clustering  Discrete Mathematics (cs.DM)  FOS: Physical sciences  Physics and Society (physics.soc-ph) 

Attributed graphs model real networks by enriching their nodes with attributes accounting for properties. Several techniques have been proposed for partitioning these graphs into clusters that are homogeneous with respect to both semantic attributes and to the structure of the graph. However, time and space complexities of state of the art algorithms limit their scalability to medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a fast and scalable algorithm for partitioning large attributed graphs. The approach is robust, being compatible both with categorical and with quantitative attributes, and it is tailorable, allowing the user to weight the semantic and topological components. Further, the approach does not require the user to guess in advance the number of clusters. SToC relies on well known approximation techniques such as bottom-k sketches, traditional graph-theoretic concepts, and a new perspective on the composition of heterogeneous distance measures. Experimental results demonstrate its ability to efficiently compute high-quality partitions of large scale attributed graphs.

Source: 2017 IEEE/ACM: International Conference on Advances in Social Networks Analysis and Mining 2017, July-August 2017


[1] L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. Pics: Parameter-free identification of cohesive subgroups in large attributed graphs. In SDM, pages 439-450. SIAM, 2012.
[2] L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four degrees of separation. In Proceedings of the 4th Annual ACM Web Science Conference, pages 33-42. ACM, 2012.
[3] A. Baroni and S. Ruggieri. Segregation discovery in a social network of companies. In IDA, pages 37-48. Springer, 2015.
[4] P. Boldi, M. Rosa, and S. Vigna. Hyperanf: Approximating the neighbourhood function of very large graphs on a budget. In WWW, pages 625-634. ACM, 2011.
[5] C. Bothorel, J. D. Cruz, M. Magnani, and B. Micenkov´a. Clustering attributed graphs: models, measures and methods. Network Science, 3(03):408-444, 2015.
[6] J. G. Bruhn. The sociology of community connections. Springer Science & Business Media, 2011.
[7] J. Cao, S. Wang, F. Qiao, H. Wang, F. Wang, and S. Y. Philip. User-guided large attributed graph clustering with multiple sparse annotations. In PAKDD, pages 127-138. Springer, 2016.
[8] M. E. Celebi, H. A. Kingravi, and P. A. Vela. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl, 40(1):200-210, 2013.
[9] A. Clauset, M. E. Newman, and C. Moore. Finding community structure in very large networks. Phys rev E, 70(6):066111, 2004.
[10] E. Cohen and H. Kaplan. Summarizing data using bottom-k sketches. In PODS, pages 225-234. ACM, 2007.
[11] D. Combe, C. Largeron, E. Egyed-Zsigmond, and M. G´ery. Combining relations and text in scientific network clustering. In ASONAM, pages 1248-1253. IEEE, 2012.
[12] D. Combe, C. Largeron, M. G´ery, and E. Egyed-Zsigmond. Ilouvain: An attributed graph clustering method. In IDA, pages 181-192. Springer, 2015.
[13] M. Coscia, F. Giannotti, and D. Pedreschi. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining, 4(5):512-546, 2011.
[14] P. Crescenzi, R. Grossi, L. Lanzi, and A. Marino. A comparison R. Diestel. Graph theory. 2005. Grad. Texts in Math, 2005.
Y. Ding. Community detection: Topological vs. topical. Journal of Informetrics, 5(4):498-514, 2011.
[18] K.-C. Duong, C. Vrain, et al. A filtering algorithm for constrained clustering with within-cluster sum of dissimilarities criterion. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pages 1060-1067. IEEE, 2013.
[19] S. Gu¨nnemann, B. Boden, and T. Seidl. Db-csc: a density-based approach for subspace clustering in graphs with feature vectors. In Machine Learning and Knowledge Discovery in Databases, pages 565-580. Springer, 2011.
[20] I. Guy, N. Zwerdling, D. Carmel, I. Ronen, E. Uziel, S. Yogev, and S. Ofek-Koifman. Personalized recommendation of social software items based on social relations. In RecSys, pages 53- 60, New York, NY, USA, 2009. ACM.
[21] J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., 3rd edition, 2011.
[22] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE T Pattern Anal, 24(7):881-892, 2002.
[23] S. P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129-137, 1982.
[24] D. W. McMillan and D. M. Chavis. Sense of community: A definition and theory. J Community Psychol, 14(1):6-23, 1986.
[25] I. B. Mohamad and D. Usman. Standardization and its effects on k-means clustering algorithm. Res J Appl Sci Eng Technol, 6(17):3299-3303, 2013.
[26] A. Papadopoulos, D. Rafailidis, G. Pallis, and M. D. Dikaiakos. Clustering attributed multi-graphs with information ranking. In DEXA, pages 432-446. Springer, 2015.
[27] B. Perozzi, L. Akoglu, P. Iglesias S´anchez, and E. Mu¨ller. Focused clustering and outlier detection in large attributed graphs. In ACM SIGKDD, pages 1346-1355. ACM, 2014.
[28] M. H. Protter and C. B. Morrey. College calculus with analytic geometry. Addison-Wesley, 1977.
[29] P. I. Sa´nchez, E. Mu¨ller, K. B¨ohm, A. Kappes, T. Hartmann, and D. Wagner. Efficient algorithms for a robust modularitydriven clustering of attributed graphs. In SDM, volume 15. SIAM, 2015.
[30] J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503, 2011.
[31] N. Villa-Vialaneix, M. Olteanu, and C. Cierco-Ayrolles. Carte auto-organisatrice pour graphes ´etiquet´es. In Atelier Fouilles de Grands Graphes (FGG)-EGC, pages Article-num´ero, 2013.
[32] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395-416, 2007.
[33] D. J. Watts. Networks, dynamics, and the small-world phenomenon 1. Am J Sociol, 105(2):493-527, 1999.
[34] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng. A modelbased approach to attributed graph clustering. In SIGMOD, pages 505-516. ACM, 2012.
[35] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng. Gbagc: A general bayesian framework for attributed graph clustering. TKDD, 9(1):5, 2014.
[36] J. Yang, J. McAuley, and J. Leskovec. Community detection in networks with node attributes. In ICDM, pages 1151-1156. IEEE, 2013.
[37] T. Zhou, J. Ren, M. Medo, and Y.-C. Zhang. Bipartite network projection and personal recommendation. Phys Rev E, 76(4):046115, 2007.

Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:424181,
	title = {Efficiently clustering very large attributed graphs},
	author = {Baroni A. and Conte A. and Patrignani M. and Ruggieri S.},
	doi = {10.1145/3110025.3110030 and 10.48550/arxiv.1703.08590},
	booktitle = {2017 IEEE/ACM: International Conference on Advances in Social Networks Analysis and Mining 2017, July-August 2017},
	year = {2017}
}