×

A criterion for privacy protection in data collection and its attainment via randomized response procedures. (English) Zbl 1412.62015

With huge amounts of data that are being collected by various sources, protection of privacy and maintaining confidentiality of data of individuals has attracted the attention of statisticians who proposed several techniques and measures in recent literature. Two such criteria known as \(\rho_1\)-to-\(\rho_2\) privacy for Randomized Response (RR) surveys of categorical variables and \(\beta\)-factor privacy are found to be ‘logically sound and practical’. Let \(p_1\) and \(p_2\) denote the prior and posterior intruder probabilities for a characteristic of a respondent. For two given functions \(h_l\) and \(h_u\), we say that a privacy breach occurs if \(p_1<h_l(p_1)\) and \(p_2>h_u(p_1)\).
In this paper, the authors develop these ideas further by proposing a more general privacy criterion. A canonical form of criterion for strict privacy protection is developed. This also gives practical guidance for the choice of \(h_l\) and \(h_u\). All the randomized responses that possess the required privacy are characterized. Furthermore, the class of all admissible privacy preserving procedures are also characterized. Finally, for a simple RR technique, it is shown that a particular optimality property holds.

MSC:

62D05 Sampling theory, sample surveys
62B15 Theory of statistical experiments

Software:

RAPPOR; FRAPP
PDF BibTeX XML Cite
Full Text: DOI Euclid

References:

[1] Aggarwal, C.C. and Yu, P.S. (Eds.) (2008)., Privacy-Preserving Data Mining: Models and Algorithms, New York: Springer Science and Business Media.
[2] Agrawal, S., Haritsa, J.R. and Prakash, B.A. (2009). FRAPP: A Framework for high-accuracy privacy-preserving mining., Data Mining and Knowledge Discovery, 18, 101-139.
[3] Basu, D. (1988). Likelihood and partial likelihood. In, Statistical Information and Likelihood: A Collection of Critical Essays by Dr. D. Basu, J.K. Ghosh (ed.), Springer, New York, pp. 313-320.
[4] Blackwell, D. (1951). Comparison of experiments. In, Proceedings of Second Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, pp. 93-102. · Zbl 0044.14203
[5] Blackwell, D. (1953). Equivalent comparison of experiments., Annals of Mathematical Statistics. 24, 265-272. · Zbl 0050.36004
[6] Boreale, M., and Paolini, M. (2015). Worst-and average-case privacy breaches in randomization mechanisms., Theoretical Computer Science, 597, 40-61. · Zbl 1369.94575
[7] Chakravarti, I.M. (1975). On a characterization of irreducibility of a non-negative matrix., Linear Algebra and Its Applications, 10, 103-109. · Zbl 0302.15023
[8] Chaudhuri, A. (2010)., Randomized Response and Indirect Questioning Techniques in Surveys. Boca Raton: CRC Press.
[9] Chaudhuri, A. and Mukerjee, R. (1988)., Randomized Response: Theory and Techniques. New York: Marcel Dekker. · Zbl 0643.62002
[10] Chen, B-C., Kifer, D., LeFevre, K. and Machanavajjhala, A. (2009) Privacy-preserving data publishing., Foundations and Trends in Databases, 2, 1-167.
[11] Cruyff, M.J., Van Den Hout, A., and Van Der Heijden, P.G. (2008). The analysis of randomized response sum score variables., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 21-30. · Zbl 1400.62340
[12] Duchi, J.C., Jordan, M.I., and Wainwright, M.J. (2018). Minimax optimal procedures for locally private estimation., Journal of the American Statistical Association, 113, 182-201. · Zbl 1398.62021
[13] Erlingsson, U., Pihur, V. and Korolova, A. (2014). Rappor: Randomized aggregatable privacy-preserving ordinal response. In, Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, Scottsdale, Arizona, pp. 1054-1067.
[14] Evfimievski, A., Gehrke, J. and Srikant, R. (2003). Limiting privacy breaches in privacy-preserving data mining., Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), San Diego, pp. 211-222.
[15] Evfimievski, A., Srikant, R. Agrawal, R. and Gehrke, J. (2004) Privacy preserving mining of association rules., Information Systems, 29, 343-364.
[16] Fung, B.C.M., Wang, K., Chen, R. and Yu, P.S. (2010). Privacy-preserving data publishing: A survey of recent developments., ACM Computing Surveys, 42, 14.
[17] Gouweleeuw, J.M., Kooiman, P., Willenborg, L.C.R.J. and De Wolf, P.-P. (1998). Post randomisation for statistical disclosure control: Theory and implementation., Journal of Official Statistics, 14, 463-478. · Zbl 1167.62321
[18] Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K. and de Wolf, P.-P. (2012)., Statistical Disclosure Control. New York: John Wiley & Sons.
[19] Kairouz, P., Bonawitz, K., and Ramage, D. (2016a). Discrete distribution estimation under local privacy. In, Proceedings of the 33rd International Conference on Machine Learning, New York, pp. 2436-2444.
[20] Kairouz, P., Oh, S., and Viswanath, P. (2016b). Extremal Mechanisms for Local Differential Privacy., Journal of Machine Learning Research, 17, 1-51. · Zbl 1360.68433
[21] Kass, R.E., and Raftery, A.E. (1995). Bayes factors., Journal of the American Statistical Association, 90, 773-795. · Zbl 0846.62028
[22] Kifer, D. and Lin, B-R. (2012). An axiomatic view of statistical privacy and utility., Journal of Privacy and Confidentiality, 4, 5-49.
[23] Minc, H. (1988)., Nonnegative Matrices. New York: John Wiley & Sons. · Zbl 0638.15008
[24] Nayak, T.K. and Adeshiyan, S.A. (2016). On invariant post-randomization for statistical disclosure control., International Statistical Review, 84, 26-42.
[25] Nayak, T.K., Adeshiyan, S.A. and Zhang, C. (2016). A Concise Theory of Randomized Response Techniques for Privacy and Confidentiality Protection., Handbook of Statistics, 34, 273-286. · Zbl 1365.62042
[26] Nayak, T.K., Zhang, C., and Adeshiyan, S.A. (2015). Emerging applications of randomized response concepts and some related issues., Model Assisted Statistics and Applications, 10, 335-344.
[27] Nayak, T.K., Zhang, C., and You, J. (2018). Measuring Identification Risk in Microdata Release and Its Control by Post-randomisation., International Statistical Review, 86, 300-321.
[28] Taussky, O. (1949). A recurring theorem on determinants., The American Mathematical Monthly, 56, 672-676. · Zbl 0036.01301
[29] Torra, V. (2017)., Data Privacy: Foundations, New Developments and the Big Data Challenge. New York: Springer.
[30] Van den Hout, A., and Elamir, E.A. (2006). Statistical disclosure control using post randomisation: Variants and measures for disclosure risk., Journal of Official Statistics, 22, 711-731.
[31] Van den Hout, A. and Van der Heijedn, P.G. (2002). Randomized response, statistical disclosure control and misclassification: A review., International Statistical Review, 70, 269-288. · Zbl 1217.62011
[32] Warner, S.L. (1965). Randomized response: A survey technique for eliminating evasive answer bias., Journal of the American Statistical Association, 60, 63-69. · Zbl 1298.62024
[33] Willenborg, L. · Zbl 0973.62009
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.