×

Inference attacks on genomic privacy with an improved HMM and an RCNN model for unrelated individuals. (English) Zbl 1456.68036

Summary: In recent years, the collection of large-scale genomic data for individuals has become feasible and affordable. Concurrently, several practical attacks targeting genome re-identification and genotype inference have emerged to threaten the confidentiality of genomic data sharing, leading to security and privacy concerns regarding genomic data. The authors have shown that this problem can be even worse in this paper. Specifically, two possible large-scale genotype inference attack stretegies for nonrelatives have exposed. One is based on an improved hidden Markov model (iHMM), and the other is based on a regressive convolutional neural network (RCNN). By using a genomic privacy metric combining the attacker’s incorrectness, the attacker’s uncertainty, and the genomic privacy loss of the victims, it is shown that with these atrategies, the attack can be significantly more severe than those reported previously. It is also shown that machine learning can be applied to empower large-scale inference attacks against genomic privacy.

MSC:

68P27 Privacy of data
68T05 Learning and adaptive systems in artificial intelligence
92D10 Genetics and epigenetics
92D20 Protein sequences, DNA sequences

Software:

Forensically; IMPUTE
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ayday, E.; Humbert, M., Inference attacks against kin genomic privacy, IEEE Secur. Privacy, 15, 5, 29-37 (2017)
[2] Ayday, E.; Raisaro, J. L.; Hubaux, J., Personal use of the genomic data: privacy vs. storage cost, 2013 IEEE Global Communications Conference, GLOBECOM 2013, Atlanta, GA, USA, December 9-13, 2013, 2723-2729 (2013)
[3] Cai, R.; Hao, Z.; Winslett, M.; Xiao, X.; Yang, Y.; Zhang, Z.; Zhou, S., Deterministic identification of specific individuals from GWAS results, Bioinformatics, 31, 11, 1701-1707 (2015)
[4] Deznabi, I.; Mobayen, M.; Jafari, N.; Tastan, O.; Ayday, E., An inference attack on genomic data using kinship, complex correlations, and phenotype information, IEEE/ACM Trans. Comput. Biol.Bioinf., 15, 4, 1333-1343 (2018)
[5] Durbin, R.; Eddy, S. R.; Krogh, A.; Mitchison, G., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (1998), Cambridge University Press · Zbl 0929.92010
[6] En.wikipedia.org, 2019, Inference attack, Accessed April 22. (https://en.wikipedia.org/wiki/Inference_attack).
[7] Ganju, K.; Wang, Q.; Yang, W.; Gunter, C. A.; Borisov, N., Property inference attacks on fully connected neural networks using permutation invariant representations, Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October 15-19, 2018, 619-633 (2018)
[8] Gong, N. Z.; Liu, B., You are who you know and how you behave: Attribute inference attacks via users’ social friends and behaviors, 25th USENIX Security Symposium (USENIX Security 16), 979-995 (2016)
[9] Gymrek, M.; McGuire, A. L.; Golan, D.; Halperin, E.; Erlich, Y., Identifying personal genomes by surname inference, Science, 339, 6117, 321-324 (2013)
[10] Harmanci, A.; Gerstein, M., Quantification of private information leakage from phenotype-genotype data: linking attacks, Nat. Methods, 13, 3, 251-256 (2016)
[11] He, Z.; Li, Y.; Li, J.; Yu, J.; Gao, H.; Wang, J., Addressing the threats of inference attacks on traits and genotypes from individual genomic data, Bioinformatics Research and Applications - 13th International Symposium, ISBRA 2017, Honolulu, HI, USA, May 29, - June 2, 2017, Proceedings, 223-233 (2017)
[12] P. Hess, Controversial geneticist warns: we can read your face in your dna., 2017, Accessed June 2, 2018. (https://www.inverse.com/article/36145-genetic-privacy-venter-23andme).
[13] Homer, N.; Szelinger, S.; Redman, M.; Duggan, D.; Tembe, W.; Muehling, J.; Pearson, J. V.; Stephan, D. A.; Nelson, S. F.; Craig, D. W., Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays, PLOS Genet., 4, 8, 1-9 (2008)
[14] B. Howie, J. Marchini, 2019, IMPUTE2, Accessed April 22. (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference).
[15] Howie, B. N.; Donnelly, P.; Marchini, J., A flexible and accurate genotype imputation method for the next generation of genome-wide association studies, PLOS Genet., 5, 6, 1-15 (2009)
[16] Hu, J.; Brown, M. K.; Turin, W., HMM based online handwriting recognition, IEEE Trans. Pattern Anal. Mach.Intell., 18, 10, 1039-1045 (1996)
[17] Humbert, M.; Ayday, E.; Hubaux, J.-P.; Telenti, A., Addressing the concerns of the lacks family: quantification of kin genomic privacy, Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security. Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, CCS ’13, 1141-1152 (2013), ACM: ACM New York, NY, USA
[18] Libbrecht, M. W.; Noble, W. S., Machine learning applications in genetics and genomics, Nat. Rev. Genet., 16, 6, 321-332 (2015)
[19] Long, J.; Shelhamer, E.; Darrell, T., Fully convolutional networks for semantic segmentation, IEEE Trans. Pattern Anal. Mach.Intell., 39, 4, 640-651 (2017)
[20] Mailman, M. D.; Feolo, M.; Jin, Y.; Kimura, M.; Tryka, K.; Bagoutdinov, R.; Hao, L.; Kiang, A.; Paschall, J.; Phan, L., The NCBI dbGaP database of genotypes and phenotypes, Nat. Genet., 39, 10, 1181 (2007)
[21] Marchini, J.; Howie, B.; Myers, S.; McVean, G.; Donnelly, P., A new multipoint method for genome-wide association studies by imputation of genotypes, Nat. Genet., 39, 7, 906-913 (2007)
[22] Narain, S.; Vo-Huu, T. D.; Block, K.; Noubir, G., Inferring user routes and locations using zero-permission mobile sensors, IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016, 397-413 (2016)
[23] Nyholt, D. R.; Yu, C.-E.; Visscher, P. M., On Jim Watson’s APOE status: genetic information is hard to hide, Eur. J. Hum. Genet., 17, 2, 147-149 (2009)
[24] Peng, C.; Ding, H.; Zhu, Y.; Tian, Y.; Fu, Z., Information entropy models and privacy metrics methods for privacy protection, J. Softw., 27, 8, 1891-1903 (2016) · Zbl 1374.94716
[25] Pouliot, D.; Wright, C. V., The shadow nemesis: Inference attacks on efficiently deployable, efficiently searchable encryption, Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016, 1341-1352 (2016)
[26] Rabiner, L. R., A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, 77, 2, 257-286 (1989)
[27] Rohlfs, R. V.; Fullerton, S. M.; Weir, B. S., Familial identification: population structure and relationship distinguishability, PLOS Genet., 8, 2, e1002469 (2012)
[28] Samani, S. S.; Huang, Z.; Ayday, E.; Elliot, M.; Fellay, J.; Hubaux, J.-P.; Kutalik, Z., Quantifying genomic privacy via inference attack with high-order SNV correlations, Proceedings of the 2015 IEEE Security and Privacy Workshops. Proceedings of the 2015 IEEE Security and Privacy Workshops, SPW ’15, 32-40 (2015), IEEE Computer Society: IEEE Computer Society Washington, DC, USA
[29] Schadt, E. E.; Woo, S.; Hao, K., Bayesian method to predict individual SNP genotypes from gene expression data, Nat. Genet., 44, 5, 603-608 (2012)
[30] S. Scutti, What the golden state killer case means for your genetic privacy, 2018, Accessed May 28, 2018. (https://www.cnn.com/2018/04/27/health/golden-state-killer-genetic-privacy/index.html).
[31] Shi, X.; Wu, X., An overview of human genetic privacy, Ann. New York Acad. Sci., 1387, 1, 61-72 (2017)
[32] Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V., Membership inference attacks against machine learning models, 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017, 3-18 (2017)
[33] Shringarpure, S.; Bustamante, C., Privacy risks from genomic data-sharing beacons, Am. J. Hum. Genet., 97, 5, 631-646 (2015)
[34] Stamp, M., A revealing introduction to hidden Markov models, Department of Computer Science San Jose State University, 26-56 (2004)
[35] L. Sweeney, A. Abu, J. Winn, Identifying participants in the personal genome project by name, 2013.
[36] The Genomes Project Consortium, A global reference for human genetic variation, Nature, 526, 68 (2015)
[37] IGSR: the international genome sample resource, 2019, Accessed April 22. (http://www.internationalgenome.org/),
[38] The International Genome Sample Resource (IGSR), 2019Which populations are part of your study?, Accessed April 22. (http://www.internationalgenome.org/category/population/).
[39] The National Human Genome Research Institute, 2019, Privacy in genomics, Accessed April 22. (https://www.genome.gov/27561246/privacy-in-genomics).
[40] Thorisson, G. A.; Smith, A. V.; Krishnan, L.; Stein, L. D., The international HapMap project web site, Genome Res., 15, 11, 1592-1593 (2005)
[41] U.S. Equal Employment Opportunity Commission, Genetic information nondiscrimination act of 2008, 2008, = from Accessed 1 June 2018. https://www.eeoc.gov/laws/statutes/gina.cfm).
[42] Wagner, I., Evaluating the strength of genomic privacy metrics, ACM Trans. Priv. Secur., 20, 1, 2:1-2:34 (2017)
[43] Walsh, S.; Liu, F.; Ballantyne, K. N.; van Oven, M.; Lao, O.; Kayser, M., Irisplex: a sensitive dna tool for accurate prediction of blue and brown eye colour in the absence of ancestry information, Forensic Sci. Int., 5, 3, 170-180 (2011)
[44] Wang, R.; Li, Y. F.; Wang, X.; Tang, H.; Zhou, X., Learning your identity and disease from research papers: information leaks in genome wide association study, Proceedings of the 16th ACM Conference on Computer and Communications Security. Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS ’09, 534-544 (2009), ACM: ACM New York, NY, USA
[45] Wang, Y.; Wen, J.; Wu, X.; Shi, X., Infringement of individual privacy via mining differentially private GWAS statistics, (Wang, Y.; Yu, G.; Zhang, Y.; Han, Z.; Wang, G., Big Data Computing and Communications (2016), Springer International Publishing: Springer International Publishing Cham), 355-366
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.