×

\(k\)-anonymous data collection. (English) Zbl 1193.68106

Summary: To protect individual privacy in data mining, when a miner collects data from respondents, the respondents should remain anonymous. The existing technique of Anonymity-Preserving Data Collection partially solves this problem, but it assumes that the data do not contain any identifying information about the corresponding respondents. On the other hand, the existing technique of Privacy-Enhancing \(k\)-Anonymization can make the collected data anonymous by eliminating the identifying information. However, it assumes that each respondent submits her data through an unidentified communication channel. In this paper, we propose \(k\)-Anonymous Data Collection, which has the advantages of both Anonymity-Preserving Data Collection and Privacy-Enhancing \(k\)-Anonymization but does not rely on their assumptions described above. We give rigorous proofs for the correctness and privacy of our protocol, and experimental results for its efficiency. Furthermore, we extend our solution to the fully malicious model, in which a dishonest participant can deviate from the protocol and behave arbitrarily.

MSC:

68P25 Data encryption (aspects in computer science)
94A62 Authentication, digital signatures and secret sharing

Software:

GeneScout
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, An Zhu, Approximation algorithms for \(k\)-anonymity, Journal of Privacy Technology, Paper Number: 20051120001, 2005.; Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, An Zhu, Approximation algorithms for \(k\)-anonymity, Journal of Privacy Technology, Paper Number: 20051120001, 2005. · Zbl 1112.68361
[2] G. Aggarwal, N. Mishra, B. Pinkas, Secure computation of the \(k\) th-ranked element, in: EUROCRYPT, 2004, pp. 40-55.; G. Aggarwal, N. Mishra, B. Pinkas, Secure computation of the \(k\) th-ranked element, in: EUROCRYPT, 2004, pp. 40-55. · Zbl 1122.68422
[3] D. Agrawal, C. Aggarwal, On the design and quantification of privacy preserving data mining algorithms, in: Proceedings of the 20th ACM PODS, 2001, pp. 247-255.; D. Agrawal, C. Aggarwal, On the design and quantification of privacy preserving data mining algorithms, in: Proceedings of the 20th ACM PODS, 2001, pp. 247-255.
[4] R. Agrawal, R. Srikant, Privacy-preserving data mining, in: Proceedings of the ACM SIGMOD, 2000, pp. 439-450.; R. Agrawal, R. Srikant, Privacy-preserving data mining, in: Proceedings of the ACM SIGMOD, 2000, pp. 439-450.
[5] R. Bayardo, R. Agrawal, Data privacy through optimal \(k\)-anonymization, in: Proceedings of the 21st ICDE, 2005.; R. Bayardo, R. Agrawal, Data privacy through optimal \(k\)-anonymization, in: Proceedings of the 21st ICDE, 2005.
[6] Emmanuel Bresson, Jacques Stern, Proofs of knowledge for non-monotone discrete-log formulae and applications, in: Proceedings of the ISC, 2002, pp. 272-288.; Emmanuel Bresson, Jacques Stern, Proofs of knowledge for non-monotone discrete-log formulae and applications, in: Proceedings of the ISC, 2002, pp. 272-288. · Zbl 1019.68540
[7] Justin Brickell, Vitaly Shmatikov, Efficient anonymity-preserving data collection, in: Proceedings of the ACM SIGKDD, 2006, pp. 76-85.; Justin Brickell, Vitaly Shmatikov, Efficient anonymity-preserving data collection, in: Proceedings of the ACM SIGKDD, 2006, pp. 76-85. · Zbl 1196.68183
[8] Chaum, D., Untraceable electronic mail, return address and digital pseudonyms, Communications of the ACM, 24, 2, 84-88 (1981)
[9] Chaum, D., The dining cryptographers problem: unconditional sender and recipient untraceability, Journal of Cryptology, 1, 1, 65-75 (1988) · Zbl 0654.94012
[10] C. Clifton, D. Marks, Security and privacy implications of data mining, in: Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1996, pp 15-19.; C. Clifton, D. Marks, Security and privacy implications of data mining, in: Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 1996, pp 15-19.
[11] G. Canfora, C.A. Visaggio, Tuning anonymity level for assuring high data quality: an empirical study, in: First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007), 2007, pp. 91-98.; G. Canfora, C.A. Visaggio, Tuning anonymity level for assuring high data quality: an empirical study, in: First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007), 2007, pp. 91-98.
[12] Dalenius, T., Finding a needle in a haystack or identifying anonymous census records, Journal of Official Statistics, 2, 3, 329-336 (1986)
[13] I. Dinur, K. Nissim, Revealing information while preserving privacy, in: Proceedings of the 22nd ACM PODS, 2003, pp. 202-210.; I. Dinur, K. Nissim, Revealing information while preserving privacy, in: Proceedings of the 22nd ACM PODS, 2003, pp. 202-210.
[14] W. Du, Z. Zhan, Using randomized response techniques for privacy-preserving data mining, in: Proceedings of the Ninth ACM SIGKDD, 2003, pp. 505-510.; W. Du, Z. Zhan, Using randomized response techniques for privacy-preserving data mining, in: Proceedings of the Ninth ACM SIGKDD, 2003, pp. 505-510.
[15] C. Dwork, K. Nissim, Privacy-preserving datamining on vertically partitioned databases, in: CRYPTO 2003, 2004.; C. Dwork, K. Nissim, Privacy-preserving datamining on vertically partitioned databases, in: CRYPTO 2003, 2004. · Zbl 1104.68038
[16] A. Evfimievski, J. Gehrke, R. Srikant, Limiting privacy breaches in privacy preserving data mining, in: Proceedings 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2003, pp. 211-222.; A. Evfimievski, J. Gehrke, R. Srikant, Limiting privacy breaches in privacy preserving data mining, in: Proceedings 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2003, pp. 211-222.
[17] A. Evfimievski, R. Srikant, R. Agrawal, J. Gehrke, Privacy preserving mining of association rules, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 217-228.; A. Evfimievski, R. Srikant, R. Agrawal, J. Gehrke, Privacy preserving mining of association rules, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 217-228.
[18] B. Fung, K. Wang, P.S. Yu, Top-down specialization for information and privacy preservation, in: Proceedings of the 21st International Conference on Data Engineering, Tokyo, Japan, April 2005.; B. Fung, K. Wang, P.S. Yu, Top-down specialization for information and privacy preservation, in: Proceedings of the 21st International Conference on Data Engineering, Tokyo, Japan, April 2005.
[19] A. Fiat, A. Shamir, How to prove yourself: practical solutions to identification and signature problems, in: Advances in Cryptology - Crypto’86, 1986, pp. 186-194.; A. Fiat, A. Shamir, How to prove yourself: practical solutions to identification and signature problems, in: Advances in Cryptology - Crypto’86, 1986, pp. 186-194.
[20] B. Gilburd, A. Schuster, R. Wolff, k-TTP: a new privacy model for large-scale distributed environments, in: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2004, pp. 563-568.; B. Gilburd, A. Schuster, R. Wolff, k-TTP: a new privacy model for large-scale distributed environments, in: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2004, pp. 563-568.
[21] Goldreich, O., Foundations of Cryptography, vol. 1 (2001), Cambridge University Press · Zbl 1007.94016
[22] Goldreich, O., Foundations of Cryptography, vol. 2 (2003), Cambridge University Press
[23] Goldwasser, S.; Micali, S., Probabilistic encryption, Journal of Computer and System Sciences, 28, 270-299 (1984) · Zbl 0563.94013
[24] P. Golle, A. Juels, Dining cryptographers revisited, in: Advances in Cryptology - EUROCRYPT, 2004, pp. 456-473.; P. Golle, A. Juels, Dining cryptographers revisited, in: Advances in Cryptology - EUROCRYPT, 2004, pp. 456-473. · Zbl 1122.94420
[25] Golle, P.; Zhong, S.; Boneh, D.; Jakobsson, M.; Juels, A., Optimistic mixing for exit-polls, (Advances in Cryptology - ASIACRYPT 2002 (2002), Springer-Verlag), 451-465 · Zbl 1065.94548
[26] HIPAA, The health insurance portability and accountability act of 1996, October 1998. <www.cms.hhs.gov/hipaa>.; HIPAA, The health insurance portability and accountability act of 1996, October 1998. <www.cms.hhs.gov/hipaa>.
[27] Hsu, C.; Chuang, Y., A novel user identification scheme with key distribution preserving user anonymity for distributed computer networks, Information Sciences, 179, 4, 422-429 (2009)
[28] Z. Huang, W. Du, B. Chen, Deriving private information from randomized data, in: Proceedings of the ACM SIGMOD Conference, 2005.; Z. Huang, W. Du, B. Chen, Deriving private information from randomized data, in: Proceedings of the ACM SIGMOD Conference, 2005.
[29] M. Jakobsson, A practical mix, in: Proceedings of the Eurocrypt 98, 1998, pp. 448-461.; M. Jakobsson, A practical mix, in: Proceedings of the Eurocrypt 98, 1998, pp. 448-461. · Zbl 0929.68065
[30] M. Jakobsson, Flash mixing, in: Proceedings of the Eighteenth PODC, 1999, pp. 83-89.; M. Jakobsson, Flash mixing, in: Proceedings of the Eighteenth PODC, 1999, pp. 83-89. · Zbl 1321.94065
[31] Qinglin Jiang, Douglas S. Reeves, Peng Ning, Improving robustness of pgp keyrings by conflict detection, in: Topics in Cryptology CT-RSA, 2004, pp. 194-207.; Qinglin Jiang, Douglas S. Reeves, Peng Ning, Improving robustness of pgp keyrings by conflict detection, in: Topics in Cryptology CT-RSA, 2004, pp. 194-207. · Zbl 1196.94075
[32] M. Kantarcioglu, C. Clifton, Privacy-preserving distributed mining of association rules on horizontally partitioned data, in: DMKD’02, 2002, pp. 24-31.; M. Kantarcioglu, C. Clifton, Privacy-preserving distributed mining of association rules on horizontally partitioned data, in: DMKD’02, 2002, pp. 24-31.
[33] M. Kantarcioglu, J. Vaidya, An architecture for privacy-preserving mining of client information, in: Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining. Maebashi City, Japan, December 2002, pp. 37-42.; M. Kantarcioglu, J. Vaidya, An architecture for privacy-preserving mining of client information, in: Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining. Maebashi City, Japan, December 2002, pp. 37-42.
[34] H. Kargupta, S. Datta, Q. Wang, K. Sivakumar, On the privacy preserving properties of random data perturbation techniques. in: The Third ICDM, 2003.; H. Kargupta, S. Datta, Q. Wang, K. Sivakumar, On the privacy preserving properties of random data perturbation techniques. in: The Third ICDM, 2003.
[35] Kim, S.; Park, S.; Won, J.; Kim, S., Privacy preserving data mining of sequential patterns for network traffic data, Information Sciences, 178, 3, 694-713 (2008)
[36] Kristen LeFevre, David J. DeWitt, Raghu Ramakrishnan, Workload-aware anonymization, in: Proceedings of the ACM SIGKDD, 2006, pp. 277-286.; Kristen LeFevre, David J. DeWitt, Raghu Ramakrishnan, Workload-aware anonymization, in: Proceedings of the ACM SIGKDD, 2006, pp. 277-286.
[37] Levine, B. N.; Shields, C., Hordes – a multicast based protocol for anonymity, Journal of Computer Security, 10, 3, 213-240 (2002)
[38] Lindell, Y.; Pinkas, B., Privacy preserving data mining, Journal of Cryptology, 15, 3, 177-206 (2002) · Zbl 1010.94008
[39] A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, l-Diversity: privacy beyond \(k\)-anonymity, in: Proceedings of the 22nd ICDE, 2006.; A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, l-Diversity: privacy beyond \(k\)-anonymity, in: Proceedings of the 22nd ICDE, 2006.
[40] A. Meyerson, R. Williams, On the complexity of optimal \(k\)-anonymity, in: Proceedings of the 22nd ACM PODS, June 2004.; A. Meyerson, R. Williams, On the complexity of optimal \(k\)-anonymity, in: Proceedings of the 22nd ACM PODS, June 2004.
[41] C. Park, K. Itoh, K. Kurosawa, Efficient anonymous channel and all/nothing election scheme, in: EUROCRYPT 93, 1993, pp. 248-259.; C. Park, K. Itoh, K. Kurosawa, Efficient anonymous channel and all/nothing election scheme, in: EUROCRYPT 93, 1993, pp. 248-259. · Zbl 0951.94512
[42] European Parliament, Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, Official Journal of the European Communities, 1995, p. 31.; European Parliament, Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, Official Journal of the European Communities, 1995, p. 31.
[43] European Parliament, Directive 97/66/EC of the European Parliament and of the Council of 15 December 1997 concering the processing of personal data and the protection of privacy in the telecommunications sector, Official Journal of the European Communities, 1998, pp. 1-8.; European Parliament, Directive 97/66/EC of the European Parliament and of the Council of 15 December 1997 concering the processing of personal data and the protection of privacy in the telecommunications sector, Official Journal of the European Communities, 1998, pp. 1-8.
[44] Reiter, M. K.; Rubin, A. D., Crowds: anonymity for web transactions, ACM Transactions on Information and System Security, 1, 1, 66-92 (1998)
[45] S. Rizvi, J. Haritsa, Maintaining data privacy in association rule mining, in: Proceedings of the 28th VLDB Conference, 2002.; S. Rizvi, J. Haritsa, Maintaining data privacy in association rule mining, in: Proceedings of the 28th VLDB Conference, 2002.
[46] K. Sako, J. Kilian, Receipt-free Mix-type voting schemes – a practical solution to the implementation of a voting booth, in: Proceedings of the EUROCRYPT 95, 1995, pp. 393-403.; K. Sako, J. Kilian, Receipt-free Mix-type voting schemes – a practical solution to the implementation of a voting booth, in: Proceedings of the EUROCRYPT 95, 1995, pp. 393-403. · Zbl 0973.94525
[47] P. Samarati, L. Sweeney, Generalizing data to provide anonymity when disclosing information (abstract), in: Proceedings of the 17th PODS, 1998, p. 188.; P. Samarati, L. Sweeney, Generalizing data to provide anonymity when disclosing information (abstract), in: Proceedings of the 17th PODS, 1998, p. 188.
[48] Su, C.; Sakurai, K., Secure computation over distributed databases, IPSJ Journal (2005)
[49] Shah, D.; Zhong, S., Two methods for privacy preserving data mining with malicious participants, Information Sciences, 177, 23, 5468-5483 (2007) · Zbl 1126.68069
[50] L. Sweeney, Guaranteeing anonymity when sharing medical data, the datafly system, in: Proceedings of Journal of the American Medical Informatics Association, 1997.; L. Sweeney, Guaranteeing anonymity when sharing medical data, the datafly system, in: Proceedings of Journal of the American Medical Informatics Association, 1997.
[51] Sweeney, L., Achieving \(k\)-anonymity privacy protection using generalization and suppression, International Journal of Uncertainity Fuzziness Knowledge-Based Systems, 10, 5, 571-588 (2002) · Zbl 1084.68537
[52] Sweeney, L., \(k\)-anonymity: a model for protecting privacy, International Journal of Uncertainity Fuzziness Knowledge-Based Systems, 10, 5, 557-570 (2002) · Zbl 1085.68589
[53] Y. Tsiounis, M. Yung, On the security of ElGamal-based encryption, in: Public Key Cryptography’98, 1998, pp. 117-134.; Y. Tsiounis, M. Yung, On the security of ElGamal-based encryption, in: Public Key Cryptography’98, 1998, pp. 117-134. · Zbl 1067.94566
[54] J. Vaidya, C. Clifton, Privacy-preserving \(k\)-means clustering over vertically partitioned data, in: Proceedings of the Ninth ACM SIGKDD, 2003, pp. 206-215.; J. Vaidya, C. Clifton, Privacy-preserving \(k\)-means clustering over vertically partitioned data, in: Proceedings of the Ninth ACM SIGKDD, 2003, pp. 206-215.
[55] J. Vaidya, C. Clifton, Privacy-preserving outlier detection, in: Proceedings of the IEEE ICDM, 2004.; J. Vaidya, C. Clifton, Privacy-preserving outlier detection, in: Proceedings of the IEEE ICDM, 2004.
[56] L. von Ahn, A. Bortz, N.J. Hopper, \(k\)-anonymous message transmission, in: Proceedings of the ACM CCS, 2003, pp. 122-130.; L. von Ahn, A. Bortz, N.J. Hopper, \(k\)-anonymous message transmission, in: Proceedings of the ACM CCS, 2003, pp. 122-130.
[57] M. Waidner, Unconditional sender and recipient untraceability in spite of active attacks, in: EUROCRYPT’89, 1989, pp. 302-319.; M. Waidner, Unconditional sender and recipient untraceability in spite of active attacks, in: EUROCRYPT’89, 1989, pp. 302-319. · Zbl 0736.94001
[58] K. Wang, P.S. Yu, S. Chakraborty, Bottom-up generalization: a data mining solution to privacy protection, in: The Fourth ICDM, 2004, pp. 249-256.; K. Wang, P.S. Yu, S. Chakraborty, Bottom-up generalization: a data mining solution to privacy protection, in: The Fourth ICDM, 2004, pp. 249-256.
[59] A. Williams, K. Barker, Controlling inference: avoiding p-level reduction during analysis, in: Proceedings of the Fifth Australian Symposium on ACSW Frontiers, 2007.; A. Williams, K. Barker, Controlling inference: avoiding p-level reduction during analysis, in: Proceedings of the Fifth Australian Symposium on ACSW Frontiers, 2007.
[60] Raymond Chi-Wing Wong, Jiuyong Li, Ada Wai-Chee Fu, Ke Wang, (alpha, \(k)\)-anonymity: an enhanced \(k\)-anonymity model for privacy preserving data publishing, in: Proceedings of the ACM SIGKDD, 2006, pp. 754-759.; Raymond Chi-Wing Wong, Jiuyong Li, Ada Wai-Chee Fu, Ke Wang, (alpha, \(k)\)-anonymity: an enhanced \(k\)-anonymity model for privacy preserving data publishing, in: Proceedings of the ACM SIGKDD, 2006, pp. 754-759.
[61] R.N. Wright, Z. Yang, Privacy-preserving Bayesian network structure computation on distributed heterogeneous data, in: Proceedings of the 10th ACM SIGKDD, 2004, pp. 713-718.; R.N. Wright, Z. Yang, Privacy-preserving Bayesian network structure computation on distributed heterogeneous data, in: Proceedings of the 10th ACM SIGKDD, 2004, pp. 713-718.
[62] Jian Xu, Wei Wang, Jian Pei, Xiaoyuan Wang, Baile Shi, Ada Wai-Chee Fu, Utility-based anonymization using local recoding, in: Proceedings of the ACM SIGKDD, 2006, pp. 785-790.; Jian Xu, Wei Wang, Jian Pei, Xiaoyuan Wang, Baile Shi, Ada Wai-Chee Fu, Utility-based anonymization using local recoding, in: Proceedings of the ACM SIGKDD, 2006, pp. 785-790.
[63] Z. Yang, R.N. Wright, Improved privacy-preserving Bayesian network parameter learning on vertically partitioned data, in: Proceedings of the International Workshop on Privacy Data Management, 2005.; Z. Yang, R.N. Wright, Improved privacy-preserving Bayesian network parameter learning on vertically partitioned data, in: Proceedings of the International Workshop on Privacy Data Management, 2005.
[64] Z. Yang, S. Zhong, R.N. Wright, Privacy-preserving classification of customer data without loss of accuracy, in: Proceedings of SIAM International Conference on Data Mining, 2005.; Z. Yang, S. Zhong, R.N. Wright, Privacy-preserving classification of customer data without loss of accuracy, in: Proceedings of SIAM International Conference on Data Mining, 2005.
[65] Z. Yang, S. Zhong, R.N. Wright, Anonymity-preserving data collection, in: Proceedings of the 11th ACM SIGKDD, 2005.; Z. Yang, S. Zhong, R.N. Wright, Anonymity-preserving data collection, in: Proceedings of the 11th ACM SIGKDD, 2005.
[66] Yin, Michael M.; Wang, Jason T. L., GeneScout: a data mining system for predicting vertebrate genes in genomic DNA sequences, Information Sciences, 163, 1-3, 201-218 (2004)
[67] Yu, J. X.; Chong, Z.; Lu, H.; Zhang, Z.; Zhou, A., A false negative approach to mining frequent itemsets from high speed transactional data streams, Information Sciences, 176, 16, 1986-2015 (2006)
[68] Zhang, L.; Zhang, B., Fuzzy reasoning model under quotient space structure, Information Sciences, 176, 4, 353-364 (2005) · Zbl 1088.68170
[69] S. Zhong, Z. Yang, R.N. Wright, Privacy-enhancing \(k\)-anonymization of customer data, in: Proceedings of the 24th ACM PODS, 2005.; S. Zhong, Z. Yang, R.N. Wright, Privacy-enhancing \(k\)-anonymization of customer data, in: Proceedings of the 24th ACM PODS, 2005.
[70] Zhong, S., Privacy-preserving algorithms for distributed mining of frequent itemsets, Information Sciences, 177, 2, 490-503 (2007) · Zbl 1142.68488
[71] Zhong, S.; Yang, Z., Guided perturbation: towards private and accurate mining, VLDB Journal, 17, 5, 1165-1177 (2008)
[72] L. Zhang, W. Zhang, Generalization-based privacy-preserving data collection, in: Proceedings of the 10th International Conference on Data Warehousing and Knowledge Discovery, DaWak, 2008.; L. Zhang, W. Zhang, Generalization-based privacy-preserving data collection, in: Proceedings of the 10th International Conference on Data Warehousing and Knowledge Discovery, DaWak, 2008.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.