×

Secure Bayesian model averaging for horizontally partitioned data. (English) Zbl 1322.62101

Summary: When multiple data owners possess records on different subjects with the same set of attributes – known as horizontally partitioned data – the data owners can improve analyses by concatenating their databases. However, concatenation of data may be infeasible because of confidentiality concerns. In such settings, the data owners can use secure computation techniques to obtain the results of certain analyses on the integrated database without sharing individual records. We present secure computation protocols for Bayesian model averaging and model selection for both linear regression and probit regression. Using simulations based on genuine data, we illustrate the approach for probit regression, and show that it can provide reasonable model selection outputs.

MSC:

62F15 Bayesian inference
62J02 General nonlinear regression
62J12 Generalized linear models (logistic models)
62-07 Data analysis (statistics) (MSC2010)

Software:

BAS; tsbridge
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on Management of Data, pp. 439–450 (2000)
[2] Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993) · Zbl 0774.62031 · doi:10.1080/01621459.1993.10476321
[3] Barbieri, M., Berger, J.: Optimal predictive model selection. Ann. Stat. 32(3), 870–897 (2004) · Zbl 1092.62033 · doi:10.1214/009053604000000238
[4] Benaloh, J.: Secret sharing homomorphisms: keeping shares of a secret. In: Odlyzko, A. (ed.) Advances in Cryptography: CRYPTO86, vol. 263, pp. 251–260. Springer, New York (1987) · Zbl 0637.94013
[5] Berger, J., Perichhi, L.: Objective Bayesian methods for model selection: introduction and comparison [with discussion]. In: Lahiri, P. (ed.) Institute of Mathematical Statistics Lecture Notes, Monograph Series, vol. 38, Beachwood Ohio, pp. 135–207 (2001)
[6] Berger, J.O., Ghosh, J.K., Mukhopadhyay, N.: An overview of robust Bayesian analysis. J. Stat. Plan. Inference 112, 241–258 (2003) · Zbl 1026.62018 · doi:10.1016/S0378-3758(02)00336-1
[7] Carlin, B., Chib, S.: Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. B 57, 473–484 (1995) · Zbl 0827.62027
[8] Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90, 1313–1321 (1995) · Zbl 0868.62027 · doi:10.1080/01621459.1995.10476635
[9] Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Med. Inf. Decis. Mak. 4, 9 (2004) · doi:10.1186/1472-6947-4-9
[10] Clyde, M.: Bayesian model averaging and model search strategies (with discussion). In: Bayesian Statistics 6–Proceedings of the Sixth Valencia International Meeting, pp. 157–185 (1999) · Zbl 0973.62022
[11] Clyde, M.: Model averaging. In: Press, S.J. (ed.) Subjective and Objective Bayesian Statistics: Principles, Models and Applications. Wiley, New York (2002)
[12] Clyde, M., George, E.I.: Model uncertainty. Stat. Sci. 19(1), 81–94 (2004) · Zbl 1062.62044 · doi:10.1214/088342304000000035
[13] Clyde, M.A.: BAS: Bayesian Adaptive Sampling for Bayesian Model Averaging. R package version 0.90 (2010)
[14] Clyde, M.A., Ghosh, J., Littman, M.L.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20(1), 80–101 (2011) · doi:10.1198/jcgs.2010.09049
[15] Dellaportas, P., Forster, J.J., Ntzoufras, I.: On Bayesian model and variable selection using MCMC. Stat. Comput. 12(1), 27–36 (2002) · Zbl 1247.62086 · doi:10.1023/A:1013164120801
[16] Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 222–233 (2004)
[17] Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules (invited journal version). Inf. Syst. 29(4), 343–364 (2004) · Zbl 02184455 · doi:10.1016/j.is.2003.09.001
[18] Gelfand, A.E., Smith, A.F.M.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409 (1990) · Zbl 0702.62020 · doi:10.1080/01621459.1990.10476213
[19] Gelman, A., Meng, X.-L.: Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci. 13, 163–185 (1998) · Zbl 0966.65004 · doi:10.1214/ss/1028905934
[20] George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993) · doi:10.1080/01621459.1993.10476353
[21] Ghosh, J., Clyde, M.A.: Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: a novel data augmentation approach. J. Am. Stat. Assoc. 106(495), 1041–1052 (2011) · Zbl 1229.62029 · doi:10.1198/jasa.2011.tm10518
[22] Ghosh, J., Reiter, J.P., Karr, A.F.: Secure computation with horizontally partitioned data using adaptive regression splines. Comput. Stat. Data Anal. 51, 5813–5820 (2007) · Zbl 1445.62080 · doi:10.1016/j.csda.2006.10.013
[23] Heaton, M., Scott, J.: Bayesian computation and the linear model. In: Chen, M.-H., Dey, D.K., Mueller, P., Sun, D., Ye, K. (eds.) Frontiers of Statistical Decision Making and Bayesian Analysis (2010)
[24] Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial (with discussion). Stat. Sci. 14(4), 382–401 (1999) · Zbl 1059.62525 · doi:10.1214/ss/1009212519
[25] Holmes, C.C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1, 145–168 (2006) · Zbl 1331.62142 · doi:10.1214/06-BA105
[26] Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD 2002), Madison, Wisconsin, pp. 24–31 (2002)
[27] Karr, A., Lin, X., Sanil, A., Reiter, J.: Secure regressions on distributed databases. J. Comput. Graph. Stat. 14, 263–279 (2005) · doi:10.1198/106186005X47714
[28] Kurgan, L., Cios, K., Tadeusiewicz, R., Ogiela, M., Goodenday, L.: Knowledge discovery approach to automated cardiac spect diagnosis. Artif. Intell. Med. 23(2), 149–169 (2001) · Zbl 05390885 · doi:10.1016/S0933-3657(01)00082-3
[29] Liang, F., Paulo, R., Molina, G., Clyde, M., Berger, J.: Mixtures of g-priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008) · Zbl 1335.62026 · doi:10.1198/016214507000001337
[30] Lin, X., Clifton, C., Zhu, Y.: Privacy preserving clustering with distributed em mixture modeling. Int. J. Knowl. Inf. Syst. 8(1), 68–81 (2005) · Zbl 02224717 · doi:10.1007/s10115-004-0148-7
[31] Lindell, Y., Pinkas, B.: Privacy-preserving data mining. In: Advances in Cryptology: CRYPTO2000, pp. 36–54. Springer, New York (2000) · Zbl 0989.68506
[32] Meng, X.-L., Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat. Sin. 6, 831–860 (1996) · Zbl 0857.62017
[33] Raftery, A.E.: Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83, 251–266 (1996) · Zbl 0864.62049 · doi:10.1093/biomet/83.2.251
[34] Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996) · Zbl 0853.62046
[35] Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[36] Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: Secure logistic regression of horizontally and vertically partitioned distributed databases. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on, pp. 723–728 (2007)
[37] Tierney, L., Kadane, J.: Accurate approximations for posterior moments and marginal densities. J. Am. Stat. Assoc. 81, 82–86 (1986) · Zbl 0587.62067 · doi:10.1080/01621459.1986.10478240
[38] Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 639–644 (2002)
[39] Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 206–215 (2003)
[40] Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001) · Zbl 0973.62009
[41] Zellner, A.: On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. North-Holland/Elsevier, Amsterdam (1986)
[42] Zellner, A., Siow, A.: Posterior odds ratios for selected regression hypotheses. In: Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain), pp. 585–603 (1980) · Zbl 0457.62004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.