Time-to-event prediction with neural networks and Cox regression.

*(English)*Zbl 1440.62354Summary: New methods for time-to-event prediction are proposed by extending the Cox proportional hazards model with neural networks. Building on methodology from nested case-control studies, we propose a loss function that scales well to large data sets and enables fitting of both proportional and non-proportional extensions of the Cox model. Through simulation studies, the proposed loss function is verified to be a good approximation for the Cox partial log-likelihood. The proposed methodology is compared to existing methodologies on real-world data sets and is found to be highly competitive, typically yielding the best performance in terms of Brier score and binomial log-likelihood. A Python package for the proposed methods is available at https://github.com/havakv/pycox.

##### MSC:

62N02 | Estimation in survival analysis and censored data |

62M20 | Inference from stochastic processes and prediction |

62M45 | Neural nets and related approaches to inference from stochastic processes |

62J05 | Linear regression; mixed models |

60G55 | Point processes (e.g., Poisson, Cox, Hawkes processes) |

##### Keywords:

Cox regression; customer churn; neural networks; non-proportional hazards; survival prediction
PDF
BibTeX
XML
Cite

\textit{H. Kvamme} et al., J. Mach. Learn. Res. 20, Paper No. 129, 30 p. (2019; Zbl 1440.62354)

Full Text:
Link

##### References:

[1] | Laura Antolini, Patrizia Boracchi, and Elia Biganzoli. A time-dependent discrimination index for survival data.Statistics in Medicine, 24(24):3927-3944, 2005. |

[2] | David R. Cox. Regression models and life-tables.Journal of the Royal Statistical Society. Series B (Methodological), 34(2):187-220, 1972. |

[3] | Cameron Davidson-Pilon, Jonas Kalderstam, Ben Kuhn, Andrew Fiore-Gartland, Luis Moneda, Paul Zivich, Alex Parij, Kyle Stark, Steven Anton, Lilian Besson, et al. Camdavidsonpilon/lifelines: v0.14.1, 2018. |

[4] | Lore Dirick, Gerda Claeskens, and Bart Baesens. Time to default in credit scoring using survival analysis: a benchmark study.Journal of the Operational Research Society, 68 (6):652-665, 2017. |

[5] | David Faraggi and Richard Simon. A neural network model for survival data.Statistics in Medicine, 14(1):73-82, 1995. |

[6] | Stephane Fotso. Deep neural networks for survival analysis based on a multi-task framework. arXiv preprints arXiv:1801.05512, 2018. |

[7] | Thomas A. Gerds, Michael W. Kattan, Martin Schumacher, and Changhong Yu. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring.Statistics in Medicine, 32(13):2173-2184, 2012. |

[8] | Larry Goldstein and Bryan Langholz. Asymptotic theory for nested case-control sampling in the Cox regression model.Annals of Statistics, 20(4):1903-1928, 1992. |

[9] | Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher. Assessment and comparison of prognostic classification schemes for survival data.Statistics in Medicine, 18(17-18):2529-2545, 1999. |

[10] | Cheng Guo and Felix Berkhahn. Entity embeddings of categorical variables.arXiv preprint arXiv:1604.06737, 2016. |

[11] | Frank E. Harrell Jr, Robert M. Califf, David B. Pryor, Kerry L. Lee, and Robert A. Rosati. Evaluating the yield of medical tests.Journal of the American Medical Association, 247 (18):2543-2546, 1982. |

[12] | Patrick J. Heagerty and Yingye Zheng. Survival model predictive accuracy and ROC curves. Biometrics, 61(1):92-105, 2005. |

[13] | Elad Hoffer, Itay Hubara, and Daniel Soudry.Train longer, generalize better: closing the generalization gap in large batch training of neural networks.arXiv preprints arXiv:1609.04836, 2017. |

[14] | Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer. Random survival forests.Annals of Applied Statistics, 2(3):841-860, 2008. |

[15] | Jared L. Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network.BMC Medical Research Methodology, 18(1), 2018. |

[16] | Nitish S. Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping T. P. Tang. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016. |

[17] | John P. Klein and Melvin L. Moeschberger.Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York, 2. edition, 2003. |

[18] | Bryan Langholz and Larry Goldstein. Risk set sampling in epidemiologic cohort studies. Statistical Science, 11(1):35-53, 1996. |

[19] | Changhee Lee, William R. Zame, Jinsung Yoon, and Mihaela van der Schaar. Deephit: A deep learning approach to survival analysis with competing risks. InThirty-Second AAAI Conference on Artificial Intelligence, 2018. |

[20] | Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. |

[21] | Margaux Luck, Tristan Sylvain, H´elo¨ıse Cardinal, Andrea Lodi, and Yoshua Bengio. Deep learning for patient-specific kidney graft survival analysis.arXiv preprint arXiv:1705.10245, 2017. |

[22] | Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. InNIPS Autodiff Workshop, 2017. |

[23] | Daniel J. Sargent. Comparison of artificial neural networks with other statistical approaches. Cancer, 91(8):1636-1642, 2001. |

[24] | L. N. Smith. Cyclical learning rates for training neural networks. In2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464-472, 2017. |

[25] | Gian A. Susto, Andrea Schirru, Simone Pampuri, Se´an McLoone, and Alessandro Beghi. Machine learning for predictive maintenance: A multiple classifier approach.IEEE Transactions on Industrial Informatics, 11(3):812-820, 2015. |

[26] | Terry M. Therneau. A package for survival analysis in S. Version 2.38, 2015. |

[27] | Duncan C. Thomas. Addendum to: Methods of cohort analysis: appraisal by application to asbestos mining, by F. D. K. Liddell, J, C. McDonald and D. C. Thomas.Journal of the Royal Statistical Society: Series A (General), 140(4):469-491, 1977. |

[28] | Dirk Van den Poel and Bart Lariviere. Customer attrition analysis for financial services using proportional hazard models.European Journal of Operational Research, 157(1): 196-217, 2004. |

[29] | Antonio Vigan, Marlene Dorgan, Jeanette Buckingham, Eduardo Bruera, and Mari E. Suarez-Almazor. Survival prediction in terminal cancer patients: a systematic review of the medical literature.Palliative Medicine, 14(5):363-374, 2000. |

[30] | Anny Xiang, Pablo Lapuerta, Alex Ryutov, Jonathan Buckley, and Stanley Azen. Comparison of the performance of neural network methods and Cox regression for censored survival data.Computational Statistics & Data Analysis, 34:243-257, 2000. |

[31] | Safoora Yousefi, Fatemeh Amrollahi, Mohamed Amgad, Chengliang Dong, Joshua E. Lewis, Congzheng Song, David A. Gutman, Sameer H. Halani, Jose E. V. Vega, Daniel J. Brat, et al. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models.Scientific Reports, 7(1):11707, 2017. |

[32] | Xinliang Zhu, Jiawen Yao, and Junzhou Huang. Deep convolutional neural network for survival analysis with pathological images. In2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 544-547, 2016. |

[33] | Xinliang Zhu, Jiawen Yao, Feiyun Zhu, and Junzhou Huang. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.