Privacy and confidentiality in an e-commerce world: data mining, data warehousing, matching and disclosure limitation. (English) Zbl 1426.68077

Summary: The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of “selective revelation” and their confidentiality implications.


68P15 Database theory
68P25 Data encryption (aspects in computer science)
Full Text: DOI arXiv Euclid


[1] Agrawal, R., Evfimievski, A. and Srikant, R. (2003). Information sharing across private databases. In Proc. 2003 ACM SIGMOD International Conference on Management of Data 86–97. ACM Press, New York.
[2] Bilenko, M., Mooney, R., Cohen, W. W., Ravikumar, P. and Fienberg, S. E. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems 18 (5) 16–23.
[3] Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis : Theory and Practice . MIT Press, Cambridge, MA. · Zbl 0332.62039
[4] Clarke, R. (1988). Information technology and dataveillance. Comm. ACM 31 498–512.
[5] Dobra, A. and Fienberg, S. E. (2001). Bounds for cell entries in contingency tables induced by fixed marginal totals. Statist. J. United Nations ECE 18 363–371.
[6] Dobra, A. and Fienberg, S. E. (2003). Bounding entries in multi-way contingency tables given a set of marginal totals. In Foundations of Statistical Inference (Y. Haitovsky, H. R. Lerche and Y. Ritov, eds.) 3–16. Physica, Heidelberg. · Zbl 05280087
[7] Domingo-Ferrer, J., Mateo-Sanz, J. M. and Sánchez del Castillo, R. X. (2000). Cryptographic techniques in statistical data protection. In Proc. Joint UN/ECE-Eurostat Work Session on Statistical Data Confidentiality 159–166. Office for Official Publications of the European Communities, Luxembourg.
[8] Domingo-Ferrer, J. and Torra, V. (2003). Disclosure risk assessment in statistical microdata protection via advanced record linkage. Stat. Comput. 13 343–354.
[9] Duncan, G. T. (2001). Confidentiality and statistical disclosure limitation. International Encyclopedia of the Social and Behavioral Sciences 2521–2525. North-Holland, Amsterdam.
[10] Duncan, G. T., Fienberg, S. E., Krishnan, R., Padman, R. and Roehrig, S. F. (2001). Disclosure limitation methods and information loss for tabular data. In Confidentiality , Disclosure and Data Access : Theory and Practical Applications for Statistical Agencies (P. Doyle, J. Lane, J. Theeuwes and L. Zayatz, eds.) 135–166. North-Holland, Amsterdam.
[11] Duncan, G. T., Keller-McNulty, S. A. and Stokes, S. L. (2004). Database security and confidentiality: Examining disclosure risk vs. data utility through the R–U confidentiality map. Technical Report 142, National Institute of Statistical Sciences.
[12] Duncan, G. T. and Stokes, S. L. (2004). Disclosure risk vs. data utility: The R–U confidentiality map as applied to topcoding. Chance 17 (3) 16–20.
[13] Dwork, C. and Nissim, K. (2004). Privacy-preserving data mining on vertically partitioned databases. In Proc. CRYPTO 2004 , 24th International Conference on Cryptology 528–544. Univ. California, Santa Barbara. · Zbl 1104.68038
[14] Fellegi, I. P. and Sunter, A. B. (1969). A theory for record linkage. J. Amer. Statist. Assoc. 64 1183–1210. · Zbl 0186.53903
[15] Fienberg, S. E. (2005). Confidentiality and disclosure limitation. Encyclopedia of Social Measurement 463–469. North-Holland, Amsterdam.
[16] Fienberg, S. E. (2005). Homeland insecurity: Datamining, terrorism detection, and confidentiality. Bull. Internat. Stat. Inst. , 55th Session. Sydney.
[17] Fienberg, S. E. and Shmueli, G. (2005). Statistical issues and challenges associated with rapid detection of bio-terrorist attacks. Stat. Med. 24 513–529.
[18] Fienberg, S. E. and Slavkovic, A. B. (2004). Making the release of confidential data from multi-way tables count. Chance 17 (3) 5–10.
[19] Fienberg, S. E. and Slavkovic, A. B. (2005). Preserving the confidentiality of categorical statistical data bases when releasing information for association rules. Data Mining and Knowledge Discovery 11 155–180. · Zbl 02242348
[20] Gopal, R., Garfinkel, R. and Goes, P. (2002). Confidentiality via camouflage: The CVC approach to disclosure limitation when answering queries to databases. Oper. Res. 50 501–516. JSTOR: · Zbl 1163.68320
[21] Information Science and Technology Study Group on Security and Privacy (chair: J. D. Tygar) (2002). Security With Privacy. Briefing.
[22] Jaro, M. A. (1995). Probabilistic linkage of large public health data files. Stat. Med. 14 491–498.
[23] Karr, A. F., Lin, X., Sanil, A. P. and Reiter, J. P. (2006). Secure statistical analysis of distributed databases. In Statistical Methods in Counterterrorism (A. Wilson, G. Wilson and D. H. Olwell, eds.). Springer, New York.
[24] Kreimer, S. F. (2004). Watching the watchers: Surveillance, transparency, and political freedom in the war on terror. J. Constitutional Law 7 133–181.
[25] Larsen, M. D. and Rubin, D. B. (2001). Iterative automated record linkage using mixture models. J. Amer. Statist. Assoc. 96 32–41. JSTOR:
[26] Li, Y., Tygar, J. D. and Hellerstein, J. M. (2005). Private matching. In Computer Security in the 21st Century (D. T. Lee, S. P. Shieh and J. D. Tygar, eds.) 25–50. Springer, New York.
[27] Lunt, T. (2003). Protecting privacy in terrorist tracking applications. Presentation to the Department of Defense Technology and Privacy Advisory Committee, September 29, 2003.
[28] Lunt, T., Staddon, J., Balfanz, D., Durfee, G., Uribe, T. et al. (2005). Protecting privacy in terrorist tracking applications. Powerpoint presentation. Available at research.microsoft.com/projects/SWSecInstitute/five-minute/Balfanz5.ppt.
[29] Muralidhar, K., Sarathy, R. and Parsa, R. (2001). An improved security requirement for data perturbation with implications for e-commerce. Decision Sci. 32 683–698.
[30] Relyea, H. C. and Seifert, J. W. (2005). Information Sharing for Homeland Security: A Brief Overview. Congressional Research Service, The Library of Congress (Updated January 10, 2005). Available at www.fas.org/sgp/crs/RL32597.pdf.
[31] Secure Flight Working Group (2005). Report of the secure flight working group. Presented to the Transportation Security Administration, September 19, 2005. Available at www.epic.org/privacy/airtravel/sfwg_report_091905.pdf.
[32] Sweeney, L. (2005). Privacy-preserving bio-terrorism surveillance. Presentation at AAAI Spring Symposium, AI Technologies for Homeland Security, Stanford Univ.
[33] Sweeney, L. (2005). Privacy-preserving surveillance using selective revelation. LIDAP Working Paper 15, School Computer Science, Carnegie Mellon Univ.
[34] Tygar, J. D. (2003). Privacy architectures. Presentation at Microsoft Research, June 18, 2003. Available at research.microsoft.com/projects/SWSecInstitute/slides/Tygar. pdf. · Zbl 1033.68547
[35] Tygar, J. D. (2003). Privacy in sensor webs and distributed information systems. In Software Security Theories and Systems (M. Okada, B. Pierce, A. Scedrov, H. Tokuda and A. Yonezawa, eds.) 84–95. Springer, New York. · Zbl 1033.68547
[36] U.S. Department of Defense Technology and Privacy Advisory Committee (TAPAC) (2004). Safeguarding Privacy in the Fight Against Terrorism. Department of Defense, Washington.
[37] U.S. General Accounting Office (2004). Data Mining : Federal Efforts Cover a Wide Range of Uses. GAO-04-548, Report to the Ranking Minority Member, Subcommittee on Financial Management, the Budget and International Security, Committee on Governmental Affairs, U.S. Senate, Washington.
[38] Winkler, W. E. (2002). Methods for record linkage and Bayesian networks. Proc. Section Survey Research Methods 3743–3748. Amer. Statist. Assoc., Alexandria, VA.
[39] Winkler, W. E. (2005). Data quality in data warehouses. Encyclopedia of Data Warehousing and Data Mining 1 . Idea Group, Hershey, PA.
[40] Zhong, S., Yang, Z. and Wright, R. N. (2005). Privacy-enhancing k -anonymization of customer data. In Proc. 24th ACM SIGMOD International Conference on Management of Data/Principles of Database Systems ( PODS 2005 ). ACM Press, New York. · Zbl 1101.46008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.