Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study. (English) Zbl 1099.62138

Summary: The paper presents an illustration and empirical study of releasing multiply imputed, fully synthetic public use microdata. Simulations based on data from the US Current Population Survey are used to evaluate the potential validity of inferences based on fully synthetic data for a variety of descriptive and analytic estimands, to assess the degree of protection of confidentiality that is afforded by fully synthetic data and to illustrate the specification of synthetic data imputation models. Benefits and limitations of releasing fully synthetic data sets are discussed.


62P99 Applications of statistics
62P25 Applications of statistics to social sciences
Full Text: DOI


[1] Abowd J. M., Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies pp 215– (2001)
[2] Dandekar R. A., Inference Control in Statistical Databases pp 117– (2002)
[3] Dandekar R. A., Inference Control in Statistical Databases pp 153– (2002)
[4] G. T. Duncan, S. A. Keller-McNulty, and S. L. Stokes (2001 ) Disclosure risk vs. data utility: the R-U confidentiality map.Technical Report. Durham: US National Institute of Statistical Sciences.
[5] Duncan G. T., J. Am. Statist. Ass. 95 pp 720– (2000)
[6] Fienberg S. E., J. Off. Statist. 14 pp 485– (1998)
[7] DOI: 10.1111/1467-9884.00298
[8] Franconi L., Statist. Comput. 13 pp 295– (2003)
[9] Fuller W. A., J. Off. Statist. 9 pp 383– (1993)
[10] General Accounting Office (2001 ) . Washington DC: United States General Accounting Office.
[11] Kennickell A. B., Record Linkage Techniques, 1997 pp 248– (1997)
[12] Lavine M., Ann. Statist. 20 pp 1222– (1992)
[13] Little R. J. A., J. Off. Statist. 9 pp 407– (1993)
[14] Liu F., Proc. Joint Statistical Meet. pp 2133– (2002)
[15] Meng X.-L., Statist. Sci. 9 pp 538– (1994)
[16] Polettini S., Statist. Comput. 13 pp 307– (2003)
[17] Polettini S., Inference Control in Statistical Databases pp 83– (2002)
[18] Raghunathan T. E., J. Off. Statist. 19 pp 1– (2003)
[19] Reiter J. P., J. Off. Statist. 18 pp 531– (2002)
[20] Reiter J. P., Surv. Methodol. (2003)
[21] Rubin D. B., Multiple Imputation for Nonresponse in Surveys (1987) · Zbl 1070.62007
[22] Rubin D. B., J. Off. Statist. 9 pp 462– (1993)
[23] Willenborg L., Elements of Statistical Disclosure Control (2001) · Zbl 0973.62009
[24] Yancey W. E., Inference Control in Statistical Databases pp 135– (2002)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.