Polettini, Silvia; Franconi, Luisa; Stander, Julian
Model based disclosure protection
Domingo-Ferrer, Josep (ed.), Inference control in statistical databases. From theory to practice. Berlin: Springer. Lect. Notes Comput. Sci. 2316, 83-96 (2002).
2002
Summary: We argue that any microdata protection strategy is based on a formal reference model. The extent of model specification yields ``parametric'', ``semiparametric'', or ``nonparametric'' strategies. Following this classification, a parametric probability model, such as a normal regression model, or a multivariate distribution for simulation can be specified. Matrix masking, covering local suppression, coarsening, microaggregation, noise injection, perturbation, provides examples of the second and third class of models. Finally, a nonparametric approach, e.g. use of bootstrap procedures for generating synthetic microdata can be adopted. In this paper we discuss the application of a regression based imputation procedure for business microdata to the Italian sample from the Community Innovation Survey. A set of regressions is used for generating flexible perturbation, for the protection varies according to identifiability of the enterprise; a spatial aggregation strategy is also proposed, based on principal components analysis. The inferential usefulness of the released data and the protection achieved by the strategy are evaluated.
