Joint and individual analysis of breast cancer histologic images and genomic covariates. (English) Zbl 1498.62197

Summary: The two main approaches in the study of breast cancer are histopathology (analyzing visual characteristics of tumors) and genomics. While both histopathology and genomics are fundamental to cancer research, the connections between these fields have been relatively superficial. We bridge this gap by investigating the Carolina Breast Cancer Study through the development of an integrative, exploratory analysis framework. Our analysis gives insights – some known, some novel – that are engaging to both pathologists and geneticists. Our analysis framework is based on angle-based joint and individual variation explained (AJIVE) for statistical data integration and exploits convolutional neural networks (CNNs) as a powerful, automatic method for image feature extraction. CNNs raise interpretability issues that we address by developing novel methods to explore visual modes of variation captured by statistical algorithms (e.g., PCA or AJIVE) applied to CNN features.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62H25 Factor analysis and principal components; correspondence analysis
68T07 Artificial neural networks and deep learning
Full Text: DOI arXiv


[1] Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M. and Kim, B. (2018). Sanity checks for saliency maps. In Advances in Neural Information Processing Systems 9505-9515.
[2] Aeffner, F., Zarella, M. D., Buchbinder, N., Bui, M. M., Goodman, M. R., Hartman, D. J., Lujan, G. M., Molani, M. A., Parwani, A. V. et al. (2019). Introduction to digital image analysis in whole-slide imaging: A white paper from the digital pathology association. J. Pathol. Inform. 10.
[3] Allott, E. H., Geradts, J., Cohen, S. M., Khoury, T., Zirpoli, G. R., Bshara, W., Davis, W., Omilian, A., Nair, P. et al. (2018). Frequency of breast cancer subtypes among African American women in the AMBER consortium. Breast Cancer Res. 20 12.
[4] Ash, J., Darnell, G., Munro, D. and Engelhardt, B. (2018). Joint analysis of gene expression levels and histological images identifies genes associated with tissue morphology. BioRxiv 458711.
[5] Backenroth, D., Goldsmith, J., Harran, M. D., Cortes, J. C., Krakauer, J. W. and Kitago, T. (2018). Modeling motor learning using heteroscedastic functional principal components analysis. J. Amer. Statist. Assoc. 113 1003-1015. · Zbl 1402.62350 · doi:10.1080/01621459.2017.1379403
[6] Beck, A. H., Sangoi, A. R., Leung, S., Marinelli, R. J., Nielsen, T. O., van de Vijver, M. J., West, R. B., van de Rijn, M. and Koller, D. (2011). Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3 108ra113. · doi:10.1126/scitranslmed.3002564
[7] Bejnordi, B. E., Mullooly, M., Pfeiffer, R. M., Fan, S., Vacek, P. M., Weaver, D. L., Herschorn, S., Brinton, L. A., van Ginneken, B. et al. (2018). Using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies. Mod. Pathol. 31 1502.
[8] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014
[9] Bishop, C. M. and Tipping, M. E. (1998). A hierarchical latent variable model for data visualization. IEEE Trans. Pattern Anal. Mach. Intell. 20 281-293.
[10] Caldarella, A., Buzzoni, C., Crocetti, E., Bianchi, S., Vezzosi, V., Apicella, P., Biancalani, M., Giannini, A., Urso, C. et al. (2013). Invasive breast cancer: A significant correlation between histological types and molecular subgroups. J. Cancer Res. Clin. Oncol. 139 617-623.
[11] Carey, L. A., Perou, C. M., Livasy, C. A., Dressler, L. G., Cowan, D., Conway, K., Karaca, G., Troester, M. A., Tse, C. K. et al. (2006). Race, breast cancer subtypes, and survival in the Carolina breast cancer study. JAMA 295 2492-2502.
[12] Carmichael, I. (2020). pyjive: A Python package for AJIVE. Available at https://github.com/idc9/py_jive. · doi:10.5281/zenodo.4091752
[13] Carmichael, I., Calhoun, B. C., Hoadley, K. A., Troester, M. A., Geradts, J., Couture, H. D., Olsson, L., Perou, C. M., Niethammer, M., Hannig, J. and Marron, J. S. (2021). Supplement to “Joint and individual analysis of breast cancer histologic images and genomic covariates.” https://doi.org/10.1214/20-AOAS1433SUPPA, https://doi.org/10.1214/20-AOAS1433SUPPB, https://doi.org/10.1214/20-AOAS1433SUPPC.
[14] Chen, C., Li, O., Tao, C., Barnett, A. J., Su, J. and Rudin, C. (2018a). This looks like that: Deep learning for interpretable image recognition. Preprint. Available at arXiv:1806.10574.
[15] Chen, P.-H. C., Gadepalli, K., MacDonald, R., Liu, Y., Nagpal, K., Kohlberger, T., Dean, J., Corrado, G. S., Hipp, J. D. et al. (2018b). Microscope 2.0: An augmented reality microscope with real-time artificial intelligence integration. Preprint. Available at arXiv:1812.00825.
[16] Chen, R. J., Lu, M. Y., Wang, J., Williamson, D. F., Rodig, S. J., Lindeman, N. I. and Mahmood, F. (2019). Pathomic fusion: An integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. Preprint. Available at arXiv:1912.08937.
[17] Chollet-Hinton, L., Puvanesarajah, S., Sandhu, R., Kirk, E. L., Midkiff, B. R., Ghosh, K., Brandt, K. R., Scott, C. G., Gierach, G. L. et al. (2018). Stroma modifies relationships between risk factor exposure and age-related epithelial involution in benign breast. Mod. Pathol. 31 1085.
[18] Colleoni, M., Rotmensz, N., Maisonneuve, P., Mastropasqua, M. G., Luini, A., Veronesi, P., Intra, M., Montagna, E., Cancello, G. et al. (2011). Outcome of special types of luminal breast cancer. Ann. Oncol. 23 1428-1436.
[19] Cooper, L. A. D., Kong, J., Gutman, D. A., Dunn, W. D., Nalisnik, M. and Brat, D. J. (2015). Novel genotype-phenotype associations in human cancers enabled by advanced molecular platforms and computational analysis of whole slide images. Lab. Invest. 95 366-376. · doi:10.1038/labinvest.2014.153
[20] Coudray, N., Ocampo, P. S., Sakellaropoulos, T., Narula, N., Snuderl, M., Fenyö, D., Moreira, A. L., Razavian, N. and Tsirigos, A. (2018). Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24 1559-1567. · doi:10.1038/s41591-018-0177-5
[21] Couture, H. D., Williams, L. A., Geradts, J., Nyante, S. J., Butler, E. N., Marron, J. S., Perou, C. M., Troester, M. A. and Niethammer, M. (2018). Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype. npj Breast Cancer 4 30.
[22] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248-255. IEEE.
[23] Di Saverio, S., Gutierrez, J. and Avisar, E. (2008). A retrospective review with long term follow up of 11,400 cases of pure mucinous breast carcinoma. Breast Cancer Res. Treat. 111 541-547.
[24] Diab, S. G., Clark, G. M., Osborne, C. K., Libby, A., Allred, D. C. and Elledge, R. M. (1999). Tumor characteristics and clinical outcome of tubular and mucinous breast carcinomas. J. Clin. Oncol. 17 1442-1448. · doi:10.1200/JCO.1999.17.5.1442
[25] Draper, B., Kirby, M., Marks, J., Marrinan, T. and Peterson, C. (2014). A flag representation for finite collections of subspaces of mixed dimensions. Linear Algebra Appl. 451 15-32. · Zbl 1326.14118 · doi:10.1016/j.laa.2014.03.022
[26] Eiro, N., Gonzalez, L. O., Fraile, M., Cid, S., Schneider, J. and Vizoso, F. J. (2019). Breast cancer tumor stroma: Cellular components, phenotypic heterogeneity, intercellular communication, prognostic implications and therapeutic opportunities. Cancers 11 664.
[27] Elmore, J. G., Longton, G. M., Carney, P. A., Geller, B. M., Onega, T., Tosteson, A. N., Nelson, H. D., Pepe, M. S., Allison, K. H. et al. (2015). Diagnostic concordance among pathologists interpreting breast biopsy specimens. JAMA 313 1122-1132.
[28] Elston, C. W. and Ellis, I. O. (2002). Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: Experience from a large study with long-term follow-up. CW Elston & IO Ellis. Histopathology 1991; 19; 403-410: AUTHOR COMMENTARY. Histopathology 41 151-151.
[29] Feng, Q., Jiang, M., Hannig, J. and Marron, J. S. (2018). Angle-based joint and individual variation explained. J. Multivariate Anal. 166 241-265. · Zbl 1408.62113 · doi:10.1016/j.jmva.2018.03.008
[30] Gaynanova, I. and Li, G. (2019). Structural learning and integrative decomposition of multi-view data. Biometrics 75 1121-1132. · Zbl 1448.62163 · doi:10.1111/biom.13108
[31] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems 2672-2680.
[32] Heng, Y. J., Lester, S. C., Tse, G. M., Factor, R. E., Allison, K. H., Collins, L. C., Chen, Y.-Y., Jensen, K. C., Johnson, N. B. et al. (2017). The molecular basis of breast cancer pathological phenotypes. J. Pathol. 241 375-391.
[33] Holzinger, A., Langs, G., Denk, H., Zatloukal, K. and Müller, H. (2019). Causability and explainabilty of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. e1312.
[34] Hotelling, H. (1936). Relation between two sets of variates. Biometrika. · JFM 62.0618.04
[35] Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 9 90.
[36] Ilse, M., Tomczak, J. M. and Welling, M. (2018). Attention-based deep multiple instance learning. Preprint. Available at arXiv:1802.04712.
[37] Jiménez, G. and Racoceanu, D. (2019). Deep learning for semantic segmentation versus classification in computational pathology: Application to mitosis analysis in breast cancer grading. Front. Bioeng. Biotechnol. 7 145.
[38] Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy-Widom limits and rates of convergence. Ann. Statist. 36 2638-2716. · Zbl 1284.62320 · doi:10.1214/08-AOS605
[39] Jones, E., Oliphant, T. and Peterson, P. (2014). SciPy: Open source scientific tools for Python.
[40] Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika 58 433-451. · Zbl 0225.62072 · doi:10.1093/biomet/58.3.433
[41] Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F. et al. (2018). Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In International Conference on Machine Learning 2673-2682.
[42] Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. Preprint. Available at arXiv:1312.6114.
[43] Komura, D. and Ishikawa, S. (2018). Machine learning methods for histopathological image analysis. Comput. Struct. Biotechnol. J. 16 34-42. · doi:10.1016/j.csbj.2018.01.001
[44] Lacroix-Triki, M., Suarez, P. H., MacKay, A., Lambros, M. B., Natrajan, R., Savage, K., Geyer, F. C., Weigelt, B., Ashworth, A. et al. (2010). Mucinous carcinoma of the breast is genomically distinct from invasive ductal carcinomas of no special type. J. Pathol. 222 282-298.
[45] Lazard, D., Sastre, X., Frid, M. G., Glukhova, M. A., Thiery, J.-P. and Koteliansky, V. E. (1993). Expression of smooth muscle-specific proteins in myoepithelium and stromal myofibroblasts of normal and malignant human breast tissue. Proc. Natl. Acad. Sci. USA 90 999-1003.
[46] Liu, Y., Gadepalli, K., Norouzi, M., Dahl, G. E., Kohlberger, T., Boyko, A., Venugopalan, S., Timofeev, A., Nelson, P. Q. et al. (2017). Detecting cancer metastases on gigapixel pathology images. Preprint. Available at arXiv:1703.02442.
[47] Liu, Y., Kohlberger, T., Norouzi, M., Dahl, G. E., Smith, J. L., Mohtashamian, A., Olson, N., Peng, L. H., Hipp, J. D. et al. (2018). Artificial intelligence-based breast cancer nodal metastasis detection: Insights into the black box for pathologists. Arch. Pathol. Lab. Med..
[48] Livasy, C. A., Karaca, G., Nanda, R., Tretiakova, M. S., Olopade, O. I., Moore, D. T. and Perou, C. M. (2006). Phenotypic evaluation of the basal-like subtype of invasive breast carcinoma. Mod. Pathol. 19 264.
[49] Lock, E. F., Hoadley, K. A., Marron, J. S. and Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Stat. 7 523-542. · Zbl 1454.62355 · doi:10.1214/12-AOAS597
[50] Lu, M. Y., Chen, R. J., Wang, J., Dillon, D. and Mahmood, F. (2019). Semi-supervised histology classification using deep multiple instance learning and contrastive predictive coding. Preprint. Available at arXiv:1910.10825.
[51] Macenko, M., Niethammer, M., Marron, J. S., Borland, D., Woosley, J. T., Guan, X., Schmitt, C. and Thomas, N. E. (2009). A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 1107-1110. IEEE.
[52] Mahmood, F., Yang, Z., Ashley, T. and Durr, N. J. (2018). Multimodal densenet. Preprint. Available at arXiv:1811.07407.
[53] Mahmood, F., Borders, D., Chen, R., McKay, G. N., Salimian, K. J., Baras, A. and Durr, N. J. (2019). Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. Med. Imag..
[54] McKinney, W. (2011). Pandas: A foundational Python library for data analysis and statistics. Python High Perform. Sci. Comput. 14.
[55] Molnar, C. et al. (2018). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. E-book at https://christophm.github.io/interpretable-ml-book/, version dated 10.
[56] Network, C. G. A. et al. (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61.
[57] Oh, D. S., Troester, M. A., Usary, J., Hu, Z., He, X., Fan, C., Wu, J., Carey, L. A. and Perou, C. M. (2006). Estrogen-regulated genes predict survival in hormone receptor-positive breast cancers. J. Clin. Oncol. 24 1656-1664.
[58] Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K. and Mordvintsev, A. (2018). The building blocks of interpretability. Distill 3 e10.
[59] Oord, A. v. d., Li, Y. and Vinyals, O. (2018). Representation learning with contrastive predictive coding. Preprint. Available at arXiv:1807.03748.
[60] Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9 62-66.
[61] Parker, J. S., Mullins, M., Cheang, M. C., Leung, S., Voduc, D., Vickery, T., Davies, S., Fauron, C., He, X. et al. (2009). Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27 1160.
[62] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L. et al. (2017). Automatic differentiation in PyTorch.
[63] Pedregosa, F., Varoquaux, G., Gramfort, A. et al. (2011). Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 2825-2830. · Zbl 1280.68189
[64] Perou, C. M., SØrlie, T., Eisen, M. B., Van De Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H. et al. (2000). Molecular portraits of human breast tumours. Nature 406 747.
[65] Pourzanjani, A. A., Wu, T. B., Jiang, R. M., Cohen, M. J. and Petzold, L. R. (2017). Understanding coagulopathy using multi-view data in the presence of sub-cohorts: A hierarchical subspace approach. In Machine Learning for Healthcare Conference 338-351.
[66] Román-Pérez, E., Casbas-Hernández, P., Pirone, J. R., Rein, J., Carey, L. A., Lubet, R. A., Mani, S. A., Amos, K. D. and Troester, M. A. (2012). Gene expression in extratumoral microenvironment predicts clinical outcome in breast cancer patients. Breast Cancer Res. 14 R51. · doi:10.1186/bcr3152
[67] Rosen, P. P. (2001). Rosen’s Breast Pathology. Williams & Wilkins, Baltimore.
[68] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. and Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV) 618-626. IEEE.
[69] Sharif Razavian, A., Azizpour, H., Sullivan, J. and Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 806-813.
[70] Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Preprint. Available at arXiv:1409.1556.
[71] Springenberg, J. T., Dosovitskiy, A., Brox, T. and Riedmiller, M. (2014). Striving for simplicity: The all convolutional net. Preprint. Available at arXiv:1412.6806.
[72] Srivastava, A., Kulkarni, C., Mallick, P., Huang, K. and Machiraju, R. (2018). Building trans-omics evidence: Using imaging and ‘omics’ to characterize cancer profiles. In PSB 377-388. World Scientific, Singapore.
[73] Sundararajan, M., Taly, A. and Yan, Q. (2017). Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70 3319-3328. JMLR.org.
[74] Troester, M. A., Sun, X., Allott, E. H., Geradts, J., Cohen, S. M., Tse, C.-K., Kirk, E. L., Thorne, L. B., Mathews, M. et al. (2017). Racial differences in PAM50 subtypes in the Carolina breast cancer study. J. Natl. Cancer Inst. 110 176-182.
[75] Van Der Walt, S., Colbert, S. C. and Varoquaux, G. (2011). The NumPy array: A structure for efficient numerical computation. Comput. Sci. Eng. 13 22.
[76] van der Walt, S., Schönberger, J. L., Nunez-Iglesias, J., Boulogne, F., Warner, J. D., Yager, N., Gouillart, E. and Yu, T. (2014). scikit-image: Image processing in Python. PeerJ 2 e453. · doi:10.7717/peerj.453
[77] Vellido, A., Martín-Guerrero, J. D. and Lisboa, P. J. (2012). Making machine learning models interpretable. In ESANN 12 163-172. Citeseer.
[78] Veta, M., Heng, Y. J., Stathonikos, N., Bejnordi, B. E., Beca, F., Wollmann, T., Rohr, K., Shah, M. A., Wang, D. et al. (2019). Predicting breast tumor proliferation from whole-slide images: The TUPAC16 challenge. Med. Image Anal..
[79] Wang, C., Pécot, T., Zynger, D. L., Machiraju, R., Shapiro, C. L. and Huang, K. (2013). Identifying survival associated morphological features of triple negative breast cancer using multiple datasets. J. Am. Med. Inform. Assoc. 20 680-687.
[80] Wang, D., Khosla, A., Gargeya, R., Irshad, H. and Beck, A. H. (2016). Deep learning for identifying metastatic breast cancer. Preprint. Available at arXiv:1606.05718.
[81] Waskom, M., Botvinnik, O., O’Kane, D., Hobson, P., Ostblom, J., Lukauskas, S., Gemperline, D. C., Augspurger, T., Halchenko, Y. et al. (2018). Seaborn (v0.9.0). · doi:10.5281/zenodo.1313201
[82] Weigelt, B., Geyer, F. C., Horlings, H. M., Kreike, B., Halfwerk, H. and Reis-Filho, J. S. (2009). Mucinous and neuroendocrine breast carcinomas are transcriptionally distinct from invasive ductal carcinomas of no special type. Mod. Pathol. 22 1401-1414. · doi:10.1038/modpathol.2009.112
[83] Wein, L., Savas, P., Luen, S. J., Virassamy, B., Salgado, R. and Loi, S. (2017). Clinical validity and utility of tumor-infiltrating lymphocytes in routine clinical practice for breast cancer patients: Current and future directions. Front. Oncol. 7 156. · doi:10.3389/fonc.2017.00156
[84] Whitfield, M. L., Sherlock, G., Saldanha, A. J., Murray, J. I., Ball, C. A., Alexander, K. E., Matese, J. C., Perou, C. M., Hurt, M. M. et al. (2002). Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol. Biol. Cell 13 1977-2000.
[85] Williams, L. A., Hoadley, K. A., Nichols, H. B., Geradts, J., Perou, C. M., Love, M. I., Olshan, A. F. and Troester, M. A. (2019). Differences in race, molecular and tumor characteristics among women diagnosed with invasive ductal and lobular breast carcinomas. Cancer Causes Control 30 31-39. · doi:10.1007/s10552-018-1121-1
[86] Wold, H. (1985). Partial least squares. In Encyclopedia of Statistical Sciences, Vol. 6 (S. Kotz and N. L. Johnson, eds.). Wiley, New York.
[87] Yang, Z. and Michailidis, G. (2016). A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32 1-8. · doi:10.1093/bioinformatics/btv544
[88] Yosinski, J., Clune, J., Bengio, Y. and Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems 3320-3328.
[89] Zack, G. W., Rogers, W. E. and Latt, S. A. (1977). Automatic measurement of sister chromatid exchange frequency. J. Histochem. Cytochem. 25 741-753.
[90] Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision 818-833. Springer, Berlin
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.