2014
Journal article  Open Access

Privacy-by-design in big data analytics and social mining

Monreale A., Rinzivillo S., Pratesi F., Giannotti F., Pedreschi D.

Privacy  Modeling and Simulation  Social Mining  Computational Mathematics  Computer Science Applications  Big Data  Privacy-by-Design  K.4.1 Public Policy Issues 

Privacy is ever-growing concern in our society and is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving human personal sensitive information. Unfortunately, it is increasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze social data describing human activities in great detail and resolution. As a result, privacy preservation simply cannot be accomplished by de-identification alone. In this paper, we propose the privacy-by-design paradigm to develop technological frameworks for countering the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of social mining and big data analytical technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technology by design, so that the analysis incorporates the relevant privacy requirements from the start.

Source: EPJ 3 (2014). doi:10.1140/epjds/s13688-014-0010-4

Publisher: Spring Open Journal


1. Batty M, Axhausen KW, Giannotti F, Pozdnoukhov A, Bazzani A, Wachowicz M, Ouzounis G, Portugali Y (2012) Smart cities of the future. Eur Phys J Spec Top 214(1):481-518
2. Brockmann D, Hufnagel L, Geisel T (2006) The scaling laws of human travel. Nature 439(7075):462-465
3. Giannotti F, Nanni M, Pedreschi D, Pinelli F, Renso C, Rinzivillo S, Trasarti R (2011) Unveiling the complexity of human mobility by querying and mining massive trajectory data. VLDB J 20(5):695-719
4. Gonzalez MC, Hidalgo CA, Barabasi A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779-782
5. Song C, Koren T, Wang P, Barabasi A-L (2010) Modelling the scaling properties of human mobility. Nat Phys 6(10):818-823
6. Song C, Qu Z, Blumm N, Barabási A-L (2010) Limits of predictability in human mobility. Science 327(5968):1018-1021
7. Wang D, Pedreschi D, Song C, Giannotti F, Barabási A-L (2011) Human mobility, social ties, and link prediction. In: KDD, pp 1100-1108
8. Balcan D, Gonçalves B, Hu H, Ramasco JJ, Colizza V, Vespignani A (2010) Modeling the spatial spread of infectious diseases: the global epidemic and mobility computational model. J Comput Sci 1(3):132-145
9. Colizza V, Barrat A, Barthelemy M, Valleron AJ, Vespignani A (2007) Modeling the worldwide spread of pandemic influenza: baseline case and containment interventions. PLoS Med 4(1):95-110
10. Colizza V, Barrat A, Barthélemy M, Vespignani A (2006) The role of the airline transportation network in the prediction and predictability of global epidemics. Proc Natl Acad Sci USA 103(7):2015-2020
11. Fumanelli L, Ajelli M, Manfredi P, Vespignani A, Merler S (2012) Inferring the structure of social contacts from demographic data in the analysis of infectious diseases spread. PLoS Comput Biol 8(9)
12. Gallos L, Havlin S, Kitsak M, Liljeros F, Makse H, Muchnik L, Stanley H (2010) Identification of influential spreaders in complex networks. Nat Phys 6(11):888-893
13. El Emam K, Cavoukian A (2014) De-identification protocols: essential for protecting privacy. http://www.privacybydesign.ca/content/uploads/2014/06/pbd-de-identifcation_essential.pdf
14. Abul O, Bonchi F, Nanni M (2008) Never walk alone: uncertainty for anonymity in moving objects databases. In: Proceedings of the 2008 IEEE 24th international conference on data engineering (ICDE), pp 376-385
15. Domingo-Ferrer J, Trujillo-Rasua R (2012) Microaggregation- and permutation-based anonymization of movement data. Inf Sci 208:55-80
16. Monreale A, Pedreschi D, Pensa RG (2010) Anonymity technologies for privacy-preserving data publishing and mining. In: Privacy-aware knowledge discovery: novel applications and new techniques, pp 3-33
17. Pensa RG, Monreale A, Pinelli F, Pedreschi D (2008) Pattern-preserving k-anonymization of sequences and its application to mobility data mining. In: PiLBA
18. Wong WK, Cheung DW, Hung E, Kao B, Mamoulis N (2007) Security in outsourcing of association rule mining. In: VLDB, pp 111-122
19. Cavoukian A (2000) Privacy design principles for an integrated justice system. Working paper. www.ipc.on.ca/index.asp?layid=86&fid1=318
20. Monreale A, Andrienko GL, Andrienko NV, Giannotti F, Pedreschi D, Rinzivillo S, Wrobel S (2010) Movement data anonymity through generalization. Trans Data Privacy 3(2):91-121
21. Giannotti F, Lakshmanan LVS, Monreale A, Pedreschi D, Wang WH (2013) Privacy-preserving mining of association rules from outsourced transaction databases. IEEE Syst J 7(3):385-395
22. Monreale A, Wang WH, Pratesi F, Rinzivillo S, Pedreschi D, Andrienko G, Andrienko N (2013) Privacy-preserving distributed movement data aggregation. In: AGILE. Springer, Berlin. doi:10.1007/978-3-319-00615-4_13
23. Furletti B, Gabrielli L, Renso C, Rinzivillo S (2012) Identifying users profiles from mobile calls habits. In: UrbComp'12, pp 17-24
24. (2010) Privacy by design resolution. In: International conference of data protection and privacy commissioners, Jerusalem, Israel, 27-29 october 2010
25. Article 29 data protection working party and working party on police and justice, the future of privacy: joint contribution to the consultation of the european commission on the legal framework for the fundamental right to protection of personal data. 02356/09/en, wp 168 (dec. 1, 2009)
26. European Data Protection Supervisor (Mar. 18, 2010) Opinion of the European data protection supervisor on promoting trust in the information society by fostering data protection and privacy
27. Federal Trade Commission (Bureau of Consumer Protection) (Dec. 2010) Preliminary staff report, protecting consumer privacy in an era of rapid change: a proposed framework for business and policy makers, at v, 41
28. Samarati P, Sweeney L (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppresion. In: Proc. of the IEEE symp. on research in security and privacy, pp 384-393
29. Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information (Abstract). In: PODS, p 188
30. Terrovitis M, Mamoulis N (2008) Privacy preservation in the publication of trajectories. In: Proc. of the 9th int. conf. on mobile data management (MDM)
31. Yarovoy R, Bonchi F, Lakshmanan LVS, Wang WH (2009) Anonymizing moving objects: how to hide a MOB in a crowd? In: EDBT, pp 72-83
32. Nergiz ME, Atzori M, Saygin Y, Güç B (2009) Towards trajectory anonymization: a generalization-based approach. Trans Data Privacy 2(1):47-75
33. Qiu L, Li Y, Wu X (2008) Protecting business intelligence and customer privacy while outsourcing data mining tasks. Knowl Inf Syst 17(1):99-120
34. Tai C, Yu PS, Chen M (2010) k-Support anonymity based on pseudo taxonomy for outsourcing of frequent itemset mining. In: KDD, pp 473-482
35. Molloy I, Li N, Li T (2009) On the (in)security and (im)practicality of outsourcing precise association rule mining. In: ICDM, pp 872-877
36. Dwork C, Mcsherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Proceedings of the 3rd theory of cryptography conference. Springer, Berlin, pp 265-284
37. Beimel A, Nissim K, Omri E (2008) Distributed private data analysis: simultaneously solving how and what. In: CRYPTO, pp 451-468
38. Chan T-HH, Shi E, Song D (2012) Optimal lower bound for differentially private multi-party aggregation. In: ESA, pp 277-288
39. Rastogi V, Nath S (2010) Differentially private aggregation of distributed time-series with transformation and encryption. In: SIGMOD, pp 735-746
40. Shi E, Chan T-HH, Rieffel EG, Chow R, Song D (2011) Privacy-preserving aggregation of time-series data. In: NDSS
41. Dwork C (2006) Differential privacy. In: Bugliesi M, Preneel B, Sassone V, Wegener I (eds) Automata, languages and programming. Lecture notes in computer science, vol 4052. Springer, Berlin, pp 1-12
42. Kohonen T (2001) Self-organizing maps. Springer series in information sciences, vol 30
43. Golle P, Partridge K (2009) On the anonymity of home/work location pairs. In: Pervasive computing, pp 390-397
44. Zang H, Bolot J (2011) Anonymization of location data does not work: a large-scale measurement study. In: Proceedings of the 17th annual international conference on mobile computing and networking, pp 145-156. ACM
45. Croft NJ, Olivier MS (2006) Sequenced release of privacy accurate call data record information in a GSM forensic investigation. ISSA, Sandton, South Africa
46. De Mulder Y, Danezis G, Batina L, Preneel B (2008) Identification via location-profiling in GSM networks. In: Proceedings of the 7th ACM workshop on privacy in the electronic society, pp 23-32. ACM
47. Trevisani E, Vitaletti A (2004) Cell-ID location technique, limits and benefits: an experimental study. In: WMCSA'04, pp 51-60. IEEE
48. De Mulder Y, Danezis G, Batina L, Preneel B (2008) Identification via location-profiling in GSM networks. In: WPES'08, pp 23-32. ACM
49. de Montjoye Y-A, Hidalgo CA, Verleysen M, Blondel VD (2013) Unique in the crowd: the privacy bounds of human mobility. Scientific reports 3

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:302293,
	title = {Privacy-by-design in big data analytics and social mining},
	author = {Monreale A. and Rinzivillo S. and Pratesi F. and Giannotti F. and Pedreschi D.},
	publisher = {Spring Open Journal},
	doi = {10.1140/epjds/s13688-014-0010-4},
	journal = {EPJ},
	volume = {3},
	year = {2014}
}

DATA SIM
Data Science for Simulating the Era of Electric Vehicles

PETRA
Personal Transport Advisor: an integrated platform of mobility patterns for Smart Cities to enable demand-adaptive transportation systems


OpenAIRE