×

Using global optimization to estimate population class sizes. (English) Zbl 1120.90040

Summary: In this paper we formulate a nonlinear optimization model to estimate population class sizes based on sample information. The model is nonconvex and has several local minima corresponding to different populations that could have been the source of the sample data. We show that many if not all local solutions can be found using a new global optimization algorithm called OptQuest/NLP (OQNLP). This can be used to estimate the number of individuals in a population with unique or rarely occurring characteristics, which is useful for assessing disclosure risk. It can also be used to estimate the number of classes in a population, a problem with applications in a variety of disciplines.

MSC:

90C26 Nonconvex programming, global optimization

Software:

CONOPT
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bethlehem J.G., Keller W.J., Pannekoek J. (1990) Disclosure control for microdata. J. Am. Stat. Assoc. 85, 38–45 · doi:10.2307/2289523
[2] Bunge J., Fitzpatrick M. (1993) Estimating the number of species: A review. J. Am. Stat. Assoc. 88, 364–373 · doi:10.2307/2290733
[3] Chen G., Keller-McNulty S. (1998) Estimation of identification disclosure risk in microdata. J. Official Stat. 14, 79–95
[4] Dalenius T. (1981) A Simple Procedure for Controlled Rounding. Norstedts Tryckeri, Stockholm
[5] Dalenius T., Reiss S.P. (1982) Data swapping: A technique for disclosure control. J. Stat. Plan. Infer. 6, 73–85 · Zbl 0475.68060 · doi:10.1016/0378-3758(82)90058-1
[6] De Waal A.G., Willenborg L.C.R.J. (1998) Optimal local supression in microdata. J. Official Stat. 14, 421–435
[7] Drud A. (1994) CONOPT–A Large Scale GRG Code. ORSA J. Comput. 6, 207–216 · Zbl 0806.90113
[8] Efron B., Thisted R. (1976) Estimating the number of unseen species: How many words did Shakespear know. Biometrika 63, 435–447 · Zbl 0344.62088
[9] Gill, P.E.,Murray,W., Saunders,M.A.: UsersGuide forSNOPTVersion 7,Department ofManagement Science and Engineering, Systems Optimization Laboratory, Stanford University, Stanford, CA, 94305-4026, USA, March 20, (2006)
[10] Greenberg B.G., Zayatz L.V. (1992) Measuring risk in public use microdata files. Statistica Neerlandica 46, 33–48 · Zbl 04507964 · doi:10.1111/j.1467-9574.1992.tb01325.x
[11] Greenberg, B.S. New Approaches to Estimate Disclosure Risk, Presented at the NSF Confidentiality Workshop, Washington, DC, May 12–13 (2003). Retrieved June 1, 2005 from http://www.urban.org/nsfpresentations/pdfs/05_Greenberg.pdf
[12] Haas, P., Naughton, J., Sehadri, S., Stokes, L. Sampling-based estimation of the number of distinct values of an attribute. VLDB 95: Proceedings of the International Conference on Very large Databases (In: Dayal, U., Gray, P., Nishio, S. (eds.) pp. 311–322 (1995).
[13] Hoshino N. (2001) Applying Pittman’s sampling formula to microdata disclosure risk assessment. J. Official Stat. 17, 499–520
[14] Kim, J. A method for limiting disclosure in microdata based on random noise and transformation. Proceedings of the Section on Survey Research Methods Section. American Statistical Association, Alexandria, VA pp. 370–374 (1986)
[15] Laguna, M. Optimization of Complex Systems for OptQuest (1997). Retrieved May 23, 2005 from http://www.crystalball.com/optquest/complexsystems.html
[16] Lasdon, L., Plummer, J., Ugray, Z., Bussieck, M. Improved filters and randomized drivers for multi-start global optimization. Submitted to Journal of Global Optimization, March 2005 · Zbl 1180.90250
[17] Madigan D., York J.C. (1997) Bayesian methods for estimation of the size of a closed population. Biometrika 84(1): 19–31 · Zbl 0887.62029 · doi:10.1093/biomet/84.1.19
[18] Nash S.G., Sofer A. (1996) Linear and Nonlinear Programming. McGraw-Hill, New York
[19] Skinner, C.J., Holmes, D.J. Modelling population uniqueness. Proceedings of the International Seminar on Statistical Confidentiality. pp. 175–199. Statistical Office of the European Communities, Luxembourg, (1993)
[20] Smith-Cayama, R.A., Thomas, D.R. Estimating the number of distinct valid signatures in initiative petitions. Proceedings of the Survey Research Methods Section. pp. 238–243. American Statistical Association, Alexandria, VA, (1999)
[21] Takemura, Some superpopulation models for estimating the number of population uniques. Statistical Data Protection–Proceedings of the Conference, Lisbon, 25–27 March 1998–1999 edition, pp. 59–76. Office for Official Publications of the European Communities, Luxembourg (1999)
[22] Ugray, Z., Plummer, J.C., Glover, F.W., Kelly, J., Lasdon, L.S., Marti, R. A multistart scatter search heuristic for smooth NLP and MINLP problems. Conference on Adaptive Memory and Evolution: Tabu Search and Scatter Search. University of Mississippi at Oxford, March 8–10, (2001) · Zbl 1072.90572
[23] Ugray, Z., Plummer, J.C., Glover, F.W., Kelly, J., Marti, R. Scatter search and local NLP solvers: A multistart framework for global optimization. To appear in INFORMS Journal on Computing. · Zbl 1241.90093
[24] White, J.K., Sangiovanni-vincentelli, A. Relaxation Techniques for the Simulation of VLSI Circuits, Kluwer Academic Publishers (1987)
[25] Zayatz, L.V. Estimation of the percent of unique population elements in microdata file using the sample. Statistical Research Division Report Series, Census/SRD/RR-91/08 (1991).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.