×

Constrained obfuscation of relational databases. (English) Zbl 1355.68075

Summary: The need to share data often conflicts with privacy preservation. Data obfuscation attempts to overcome this conflict by modifying the original data while optimizing both privacy and utility measures. In this paper we introduce the concept of Constrained Obfuscation Problems (COPs) which formulate the task of obfuscating data stored in relational databases. The main idea behind COPs is that many obfuscation scenarios can be modeled as a data generation process which is constrained by a predefined set of rules. We demonstrate the flexibility of the COP definition by modeling several different obfuscation scenarios: Production Data Obfuscation for Application Testing (PDOAT), anonymization of relational data, and anonymization of social networks. We then suggest a general approach for solving COPs by reducing them into a set of Constrained Satisfaction Problems (CSPs). Such reduction enables the employment of the well-studied CSP framework in order to solve a wide range of complex rules. Some of the resulting CSPs may contain a large number of variables, which may make them intractable. In order to overcome such intractability issues, we present two useful heuristics that decompose such large CSPs into smaller tractable sub-CSPs. We also show how the well-known \(\ell\)-diversity privacy measure can be incorporated into the COP framework in order to evaluate the privacy level of COP solutions. Finally, we evaluate the new method in terms of privacy, utility and execution time.

MSC:

68P15 Database theory
94A60 Cryptography
94A62 Authentication, digital signatures and secret sharing

Software:

UCI-ml; Choco; QAGen
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aggarwal, C.; Yu, P., Privacy-Preserving Data Mining: Models and Algorithms (2008), Springer-Verlag Inc.: Springer-Verlag Inc. New York
[2] Agrawal, R.; Srikant, R., Privacy-preserving data mining, Sigmod Record, 29, 439-450 (2000)
[5] Bakken, D.; Rarameswaran, R.; Blough, D.; Franz, A.; Palmer, T., Data obfuscation: anonymity and desensitization of usable data sets, Secur. Privacy, IEEE, 2, 34-41 (2004)
[12] Burnett, L.; Barlow-Stewart, K.; Proos, A.; Aizenberg, H., The GeneTrustee: a universal identification system that ensures privacy and confidentiality for human genetic databases, J. Law Med., 10, 506 (2003)
[19] Duncan, K.; Wells, D., A rule-based data cleansing, J. Data Warehousing, 4, 146-159 (1999)
[21] Fung, B.; Wang, K.; Chen, R.; Yu, P., Privacy-preserving data publishing: a survey of recent developments, ACM Comput. Surv., 42, 1-53 (2010)
[25] Gionis, A.; Tassa, T., k-Anonymization with minimal loss of information, Trans. Knowl. Data Eng., 21, 206-219 (2009)
[26] Goldberger, J.; Tassa, T., Efficient anonymizations with enhanced utility, Trans. Data Privacy, 3, 149-175 (2010)
[27] Gray, J.; Sundaresan, P.; Englert, S.; Baclawski, K.; Weinberger, P., Quickly generating billion-record synthetic databases, SIGMOD Record, 23, 252 (1994)
[30] Hoag, J. E.; Thompson, C. W., A parallel general-purpose synthetic data generator, SIGMOD Record, 36, 19-24 (2007)
[33] Kenig, B.; Tassa, T., A practical approximation algorithm for optimal k-anonymity, Data Min. Knowl. Discov., 25, 134-168 (2012) · Zbl 1260.68466
[34] Last, M.; Tassa, T.; Zhmudyak, A.; Shmueli, E., Improving accuracy of classification models induced from anonymized datasets, Inform. Sci., 256, 138-161 (2014)
[36] Li, N.; Li, T.; Venkatasubramanian, S., Closeness: a new privacy measure for data publishing, Trans. Knowl. Data Eng., 22, 943-956 (2010)
[38] Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M., l-Diversity: privacy beyond k-anonymity, Trans. Knowl. Discov. Data, 1, 3 (2007)
[41] Rebollo-Monedero, D.; Forne, J.; Domingo-Ferrer, J., From t-closeness-like privacy to postrandomization via information theory, Trans. Knowl. Data Eng., 22, 1623-1636 (2010)
[43] Russell, S.; Norvig, P.; Canny, J.; Malik, J.; Edwards, D., Artificial Intelligence: A Modern Approach (1995), Prentice hall Englewood Cliffs: Prentice hall Englewood Cliffs NJ · Zbl 0835.68093
[44] Samarati, P., Protecting Respondents’ Identities in Microdata Release, Trans. Knowl. Data Eng., 13, 1010-1027 (2001)
[46] Sweeney, L., k-Anonymity: a model for protecting privacy, Int. J. Uncertainty, Fuzz. Knowl.-Based Syst., 10, 557-570 (2002) · Zbl 1085.68589
[47] Tassa, T.; Cohen, D. J., Anonymization of centralized and distributed social networks by sequential clustering, Trans. Knowl. Data Eng., 25, 311-324 (2013)
[48] Tassa, T.; Mazza, A.; Gionis, A., k-Concealment: an alternative model of k-type anonymity, Trans. Data Privacy, 5, 189-222 (2012)
[49] Thompson, S., Sample size for estimating multinomial proportions, Am. Stat., 41, 42-46 (1987)
[53] Wang, K.; Fung, B.; Yu, P., Handicapping attacker’s confidence: an alternative to k-anonymization, Knowl. Inform. Syst., 11, 345-368 (2007)
[56] Wu, X.; Wang, Y.; Guo, S.; Zheng, Y., Privacy preserving database generation for database application testing, Fundam. Inform., 78, 595-612 (2007) · Zbl 1119.68360
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.