Precision medicine. (English) Zbl 1478.92079

Ashenden, Stephanie Kay (ed.), The era of artificial intelligence, machine learning, and data science in the pharmaceutical industry. Amsterdam: Elsevier/Academic Press. 139-157 (2021).
Summary: It has been shown that several diseases have genetic basis and bias. These changes such as mutations, copy number alteration, epigenetic changes, and RNA expression cause disease progression, adverse events, and drug resistance, among other physiological and pharmacological interferences to the patient. A key example here is tumor growth and metastasis to other organs. While medicine has always considered the individual patient’s situation and there is a long history of segmenting patient populations by data, the recent explosion of rich biomedical data and informatic tools has galvanized the idea of “precision medicine.” By identifying fundamental patient types, we can deliver “the right treatment for the right patient at the right time.” Similar ideas can be found in the “5 R” framework for effective drug development, which emphasizes that a drug must reach the right target, in right tissue, with the right safety profile in the right patient. Treatments can be selected that are most likely to help patients based on a molecular understanding of their disease, rather than a possibly deceptive clinical presentation. This is especially keen for the increasing number of high cost therapies with great variation in patient-to-patient outcomes, for example, checkpoint inhibitors. If we can correctly partition a patient population, we get more homogenous cohorts with shared disease states for mechanistic investigation. If we can subtype a trial population to one where a candidate drug will be effective, we can reduce the trial size. If smaller and rarer disease populations – often unrecognized or neglected in therapy development – can be detected within larger populations, they can be investigated and treated appropriately.
For the entire collection see [Zbl 1462.92005].


92C50 Medical applications (general)
62P10 Applications of statistics to biology and medical sciences; meta analysis


NbClust; t-SNE; SVRc; clValid; rms
Full Text: Link


[1] S. Yan; Y. H. Kwan; C. S. Tan; J. Thumboo; L. L. Low, A systematic review of the clinical application of data-driven population segmentation analysis, BMC Med Res Methodol, 18 (2018), p. 121
[2] E. Abrahams, Right drug-right patient-right time: personalized medicine coalition, Clin Transl Sci, 1 (2008), pp. 11-12
[3] P. Morgan, Impact of a five-dimensional framework on R&D productivity at AstraZeneca, Nat Rev Drug Discov, 17 (2018), pp. 167-181
[4] S. L. Topalian; J. M. Taube; R. A. Anders; D. M. Pardoll, Mechanism-driven biomarkers to guide immune checkpoint blockade in cancer therapy, Nat Rev Cancer, 16 (2016), pp. 275-287
[5] J. A. Moscow; T. Fojo; R. L. Schilsky, The evidence framework for precision cancer medicine, Nat Rev Clin Oncol, 15 (2018), pp. 183-192
[6] J. A. Drebin; V. C. Link; D. F. Stern; R. A. Weinberg; M. I. Greene, Down-modulation of an oncogene protein product and reversion of the transformed phenotype by monoclonal antibodies, Cell, 41 (1985), pp. 697-706
[7] D. Chakravarty, OncoKB: a precision oncology knowledge base, JCO Precis Oncol (2017)
[8] F. M. Behan, Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens, Nature, 568 (2019), pp. 511-516
[9] A. Mullard, Synthetic lethality screens point the way to new cancer drug targets, Nat Rev Drug Discov, 16 (2017), pp. 589-591
[10] P. Blume-Jensen, Biology of human tumors development and clinical validation of an in situ biopsy-based multimarker assay for risk stratification in prostate cancer, Clin Cancer Res, 21 (2015), pp. 2591-2600
[11] J. Cullen, A biopsy-based 17-gene genomic prostate score predicts recurrence after radical prostatectomy and adverse surgical pathology in a racially diverse population of men with clinically low- and intermediate-risk prostate cancer, Eur Urol, 68 (2015), pp. 123-131
[12] S. Paik, A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer, N Engl J Med, 351 (2004), pp. 2817-2826
[13] G. Yothers, Validation of the 12-gene colon cancer recurrence score in NSABP C-07 as a predictor of recurrence in patients with stage II and III colon cancer treated with fluorouracil and leucovorin (FU/LV) and FU/LV plus oxaliplatin, J Clin Oncol, 31 (2013), pp. 4512-4519
[14] F. M. Khan; V. Bayer-Zubek, Support vector regression for censored data (SVRc): a novel tool for survival analysis, Proceedings—IEEE international conference on data mining, ICDM (2008), 10.1109/ICDM.2008.50
[15] H. T. Shiao; V. Cherkassky, Learning using privileged information (LUPI) for modeling survival data, Proceedings of the international joint conference on neural networks (2014), 10.1109/IJCNN.2014.6889517
[16] F. E. Harrell, Regression modeling strategies with applications to linear models, logistic regression, and survival analysis, Springer, New York (2001) · Zbl 0982.62063
[17] L. Gordon; R. A. Olshen, Tree-structured survival analysis, Cancer Treat Rep, 69 (1985), pp. 1065-1069
[18] M. R. Segal, Regression trees for censored data, Biometrics, 44 (1988), pp. 35-47 · Zbl 0707.62224
[19] O. L. Mangasarian; W. N. Street; W. H. Wolberg, Breast cancer diagnosis and prognosis via linear programming, Oper Res, 43 (1995), pp. 548-725 · Zbl 0857.90073
[20] S. F. Brown; A. J. Branford; W. Moran, On the use of artificial neural networks for the analysis of survival data, IEEE Trans Neural Netw, 8 (1997), pp. 1071-1077
[21] H. B. Burke, Artificial neural networks improve the accuracy of cancer survival prediction, Cancer, 79 (1997), pp. 857-862
[22] B. Zupan; J. Demšar; M. W. Kattan; J. R. Beck; I. Bratko, Machine learning for survival analysis: a case study on recurrence of prostate cancer, Joint European conference on artificial intelligence in medicine and medical decision making (1999), 10.1007/3-540-48720-4_37
[23] L. Evers; C. M. Messow, Sparse kernel methods for high-dimensional survival data, Bioinformatics, 24 (2008), pp. 1632-1638
[24] V. Van Belle; K. Pelckmans; S. Van Huffel; J. A.K. Suykens, Support vector methods for survival analysis: a comparison between ranking and regression approaches, Artif Intell Med, 53 (2011), pp. 107-118
[25] E. P. Balogh; B. T. Miller; J. R. Ball, Improving diagnosis in health care, Improving diagnosis in health care, National Academies Press, US (2016), 10.17226/21794
[26] K. J. Mitchell, What is complex about complex disorders?, Genome Biol, 13 (2012), p. 237
[27] W. C. Moore; E. R. Bleecker, Asthma heterogeneity and severity-why is comprehensive phenotyping important?, Lancet Respir Med, 2 (2014), pp. 10-11
[28] R. Schennach; M. Riedel; R. Musil; H. J. Möller, Treatment response in first-episode schizophrenia, Clin Psychopharmacol Neurosci, 10 (2012), pp. 78-87
[29] J. J. Cui, Gene-gene and gene-environment interactions influence platinum-based chemotherapy response and toxicity in non-small cell lung cancer patients, Sci Rep, 7 (2017), p. 5082
[30] C. M. Hartford; M. E. Dolan, Identifying genetic variants that contribute to chemotherapy-induced cytotoxicity, Pharmacogenomics, 8 (2007), pp. 1159-1168
[31] S. Erikainen; S. Chan, Contested futures: envisioning “personalized,” “stratified,” and “precision” medicine, New Genet Soc, 38 (2019), pp. 308-330
[32] S. Day; R. C. Coombes; L. McGrath-Lone; C. Schoenborn; H. Ward, Stratified, precision or personalised medicine? Cancer services in the ‘real world’ of a London hospital, Sociol Health Illn, 39 (2017), pp. 143-158
[33] H. Fröhlich, From hype to reality: data science enabling personalized medicine, BMC Med, 16 (2018), p. 150
[34] E. Ahlqvist, Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables, Lancet Diabetes Endocrinol, 6 (2018), pp. 361-369
[35] M. van Smeden; F. E. Harrell; D. L. Dahly, Novel diabetes subgroups, Lancet Diabetes Endocrinol, 6 (2018), pp. 439-440
[36] F. Harrell, Statistical errors in the medical literature, Statistical thinking (2017)
[37] D. Dahly, Magical clusters, (2018)
[38] L. Spielmann, Anti-Ku syndrome with elevated CK and anti-Ku syndrome with anti-dsDNA are two distinct entities with different outcomes, Ann Rheum Dis, 78 (2019), pp. 1101-1106
[39] I. Pinal-Fernandez; A. L. Mammen, On using machine learning algorithms to define clinically meaningful patient subgroups, Ann Rheum Dis, 79 (2019), p. e128
[40] U. von Luxburg; R. C. Williamson; I. Guyon, Clustering: science or art?, Proceedings of ICML Workshop on Unsupervised and Transfer Learning, vol. 27, MLR Workshop and Conference Proceedings, Cambridge (2012), pp. 65-79
[41] L. Pembrey, Understanding asthma phenotypes: the World Asthma Phenotypes (WASP) international collaboration, ERJ Open Res, 4 (2018), pp. 00013-02018
[42] D. Belgrave, Disaggregating asthma: big investigation versus big data, J Allergy Clin Immunol, 139 (2017), pp. 400-407
[43] J. J. Hornberg; F. J. Bruggeman; H. V. Westerhoff; J. Lankelma, Cancer: a systems biology disease, Biosystems, 83 (2006), pp. 81-90
[44] P. R. Somvanshi; K. V. Venkatesh, A conceptual review on systems biology in health and diseases: from biological networks to modern therapeutics, Syst Synth Biol, 8 (2014), pp. 99-116
[45] C. Lopez; S. Tucker; T. Salameh; C. Tucker, An unsupervised machine learning method for discovering patient clusters based on genetic signatures, J Biomed Inform, 85 (2018), pp. 30-39
[46] G. F. Hughes, On the mean accuracy of statistical pattern recognizers, IEEE Trans Inf Theory, 14 (1968), pp. 55-63
[47] R. Röttger, Clustering of biological datasets in the era of big data, J Integr Bioinform, 13 (2016), p. 300
[48] M. W. Libbrecht; W. S. Noble, Machine learning applications in genetics and genomics, Nat Rev Genet, 16 (2015), pp. 321-332
[49] S. Colaco; S. Kumar; A. Tamang; V. G. Biju, A review on feature selection algorithms, Emerging research in computing, information, communication and applications, Springer, Singapore (2019), pp. 133-153
[50] Z. M. Hira; D. F. Gillies, A review of feature selection and feature extraction methods applied on microarray data, Adv Bioinforma, 2015 (2015), pp. 1-13
[51] S. Budach; A. Marsico, Pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks, Bioinformatics, 34 (2018), pp. 3035-3037
[52] V. I. Jurtz, An introduction to deep learning on biological sequence data: examples and solutions, Bioinformatics, 33 (2017), pp. 3685-3690
[53] T. Ronan; Z. Qi; K. M. Naegle, Avoiding common pitfalls when clustering biological data, Sci Signal, 9 (2016), p. re6
[54] R. Nugent; M. Meila, An overview of clustering applied to molecular biology, Methods in molecular biology, vol. 620, Springer, Clifton, NJ (2010), pp. 369-404
[55] P. H. Guzzi; E. Masciari; G. M. Mazzeo; C. Zaniolo, A discussion on the biological relevance of clustering results, Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 8649 LNCS, Springer Verlag, Berlin (2014), pp. 30-44
[56] B. Andreopoulos; A. An; X. Wang; M. Schroeder, A roadmap of clustering algorithms: finding a match for a biomedical application, Brief Bioinform, 10 (2008), pp. 297-314
[57] C. Wiwie; J. Baumbach; R. Röttger, Comparing the performance of biomedical clustering methods, Nat Methods, 12 (2015), pp. 1033-1038
[58] M. R. Aure, Integrative clustering reveals a novel split in the luminal A subtype of breast cancer with impact on outcome, Breast Cancer Res, 19 (2017), p. 44
[59] J. C. Mathews, Robust and interpretable PAM50 reclassification exhibits survival advantage for myoepithelial and immune phenotypes, npj Breast Cancer, 5 (2019), p. 30
[60] M. Alderdice, Prospective patient stratification into robust cancer-cell intrinsic subtypes from colorectal cancer biopsies, J Pathol, 245 (2018), pp. 19-28
[61] H. Yang; N. J. Pizzi, Biomedical data classification using hierarchical clustering, Canadian conference on electrical and computer engineering (2004), pp. 1861-1864, 10.1109/CCECE.2004.1347570
[62] B. Pontes; R. Giráldez; J. S. Aguilar-Ruiz, Biclustering on expression data: a review, J Biomed Inform, 57 (2015), pp. 163-180
[63] V. A. Padilha; R. J.G. B. Campello, A systematic comparative evaluation of biclustering techniques, BMC Bioinf, 18 (2017), p. 55
[64] J. Xie; A. Ma; A. Fennell; Q. Ma; J. Zhao, It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data, Brief Bioinform, 20 (2018), pp. 1449-1464
[65] A. B. Jensen, Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients, Nat Commun, 5 (2014), p. 4022
[66] M. K. Beck, Diagnosis trajectories of prior multi-morbidity predict sepsis mortality, Sci Rep, 6 (2016), p. 36624
[67] H. Yang, Disease trajectories and mortality among women diagnosed with breast cancer, Breast Cancer Res, 21 (2019), p. 95
[68] A. Giannoula; A. Gutierrez-Sacristán; Á. Bravo; F. Sanz; L. I. Furlong, Identifying temporal patterns in patient disease trajectories using dynamic time warping: a population-based study, Sci Rep, 8 (2018), p. 4216
[69] Y. Zhang; S. Horvath; R. Ophoff; D. Telesca, Comparison of clustering methods for time course genomic data: applications to aging effects, ArXiv, 1404 (2014), p. 7534
[70] J. de Jong, Deep learning for clustering of multivariate clinical patient trajectories with missing values, Gigascience, 8 (2019), p. giz134
[71] P. Chalise; D. C. Koestler; M. Bimali; Q. Yu; B. L. Fridley, Integrative clustering methods for high-dimensional molecular data, Transl Cancer Res, 3 (2014), pp. 202-216
[72] C. Chauvel; A. Novoloaca; P. Veyre; F. Reynier; J. Becker, Evaluation of integrative clustering methods for the analysis of multi-omics data, Brief Bioinform, 21 (2020), pp. 541-552
[73] I. S.L. Zeng; T. Lumley, Review of statistical learning methods in integrated omics studies (an integrated information science), Bioinf Biol Insights, 12 (2018)
[74] B. K. Beaulieu-Jones; C. S. Greene, Semi-supervised learning of the electronic health record for phenotype stratification, J Biomed Inform, 64 (2016), pp. 168-178
[75] M. Charrad; N. Ghazzali; V. Boiteau; A. Niknafs, Nbclust: an R package for determining the relevant number of clusters in a data set, J Stat Softw, 61 (2014), pp. 1-36
[76] G. Brock; V. Pihur; S. Datta; S. Datta, ClValid: an R package for cluster validation, J Stat Softw, 25 (2008), pp. 1-22
[77] O. Arbelaitz; I. Gurrutxaga; J. Muguerza; J. M. Pérez; I. Perona, An extensive comparative study of cluster validity indices, Pattern Recogn, 46 (2013), pp. 243-256
[78] M. Ruiz Marin, An entropy test for single-locus genetic association analysis, BMC Genet, 11 (2010), p. 19
[79] U. Von Luxburg, Clustering stability: an overview, Found Trends Mach Learn, 2 (2009), pp. 235-274 · Zbl 1191.68615
[80] H. Yu, Bootstrapping estimates of stability for clusters, observations and model selection, Comput Stat, 34 (2019), pp. 349-372 · Zbl 1417.62180
[81] L. Van Der Maaten; G. Hinton, Visualizing data using t-SNE, J Mach Learn Res, 9 (2008), pp. 2579-2605 · Zbl 1225.68219
[82] S. Wagner; D. Wagner, Comparing clusterings—an overview, Universität Karlsruhe Technical Report 2006-04 (2007)
[83] L. A. García-Escudero; A. Gordaliza; C. Matrán; A. Mayo-Iscar, A general trimming approach to robust cluster analysis, Ann Stat, 36 (2008), pp. 1324-1345 · Zbl 1360.62328
[84] Y. Liu; S. Wu; Z. Liu; H. Chao, A fuzzy co-clustering algorithm for biomedical data, PLoS One, 12 (2017), p. e0176536
[85] J. M. Dennis; B. M. Shields; W. E. Henley; A. G. Jones; A. T. Hattersley, Disease progression and treatment response in data-driven subgroups of type 2 diabetes compared with models based on simple clinical features: an analysis using clinical trial data, Lancet Diabetes Endocrinol, 7 (2019), pp. 442-451
[86] NHGRI, The human genome project, https://www.genome.gov/human-genome-project, Accessed 18th Mar 2020
[87] G. Riddick, Predicting in vitro drug sensitivity using random forests, Bioinformatics, 27 (2011), pp. 220-224
[88] R. López-Reig, Prognostic classification of endometrial cancer using a molecular approach based on a twelve-gene NGS panel, Sci Rep, 9 (2019), p. 18093
[89] R. Toth, Random forest-based modelling to detect biomarkers for prostate cancer progression, Clin Epigenetics, 11 (2019), p. 148
[90] P. D. Johann; N. Jäger; S. M. Pfister; M. Sill, RF_purify: a novel tool for comprehensive analysis of tumor-purity in methylation array data based on random forest regression, BMC Bioinf, 20 (2019), p. 428
[91] U. Djuric; G. Zadeh; K. Aldape; P. Diamandis, Precision histology: how deep learning is poised to revitalize histomorphology for personalized cancer care, npj Precis Oncol, 1 (2017), p. 22
[92] G. Campanella, Clinical-grade computational pathology using weakly supervised deep learning on whole slide images, Nat Med, 25 (2019), pp. 1301-1309
[93] N. Coudray, Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning, Nat Med, 24 (2018), pp. 1559-1567
[94] R. Sun, A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study, Lancet Oncol, 19 (2018), pp. 1180-1191
[95] D. Kim, Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction, J Am Med Inform Assoc, 22 (2015), pp. 109-120
[96] M. Rotmensch; Y. Halpern; A. Tlimat; S. Horng; D. Sontag, Learning a health knowledge graph from electronic medical records, Sci Rep, 7 (2017), p. 5994
[97] P. Jiang, Deep graph embedding for prioritizing synergistic anticancer drug combinations, Comput Struct Biotechnol J, 18 (2020), pp. 427-438
[98] W. Zhang; J. Chien; J. Yong; R. Kuang, Network-based machine learning and graph theory algorithms for precision oncology, npj Precis Oncol, 1 (2017), p. 25
[99] G. Caravagna, Detecting repeated cancer evolution from multi-region tumor sequencing data, Nat Methods, 15 (2018), pp. 707-714
[100] J. N. Taroni, MultiPLIER: a transfer learning framework for transcriptomics reveals systemic features of rare disease, Cell Syst, 8 (2019), pp. 380-394.e4
[101] C. Vesteghem, Implementing the FAIR data principles in precision oncology: review of supporting initiatives, Brief Bioinform, 21 (2020), pp. 936-945
[102] A. Miller, The future of health care could be elementary with Watson, CMAJ, 185 (2013), pp. E367-E368
[103] S. Hatz, Identification of pharmacodynamic biomarker hypotheses through literature analysis with IBM Watson, PLoS One, 14 (2019), p. e0214619
[104] E. Strickland, How IBM Watson overpromised and underdelivered on AI health care, IEEE Spectr (2019)
[105] F. W. Zou; Y. F. Tang; C. Y. Liu; J. A. Ma; C. H. Hu, Concordance study between IBM Watson for oncology and real clinical practice for cervical cancer patients in China: a retrospective analysis, Front Genet, 11 (2020), p. 200
[106] FDA, Artificial intelligence and machine learning in software as a medical device, https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device (2021)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.