×

Stabilizing variable selection and regression. (English) Zbl 1478.62060

Summary: We consider regression in which one predicts a response \(Y\) with a set of predictors \(X\) across different experiments or environments. This is a common setup in many data-driven scientific fields, and we argue that statistical inference can benefit from an analysis that takes into account the distributional changes across environments. In particular, it is useful to distinguish between stable and unstable predictors, that is, predictors which have a fixed or a changing functional dependence on the response, respectively. We introduce stabilized regression which explicitly enforces stability and thus improves generalization performance to previously unseen environments. Our work is motivated by an application in systems biology. Using multiomic data, we demonstrate how hypothesis generation about gene function can benefit from stabilized regression. We believe that a similar line of arguments for exploiting heterogeneity in data can be powerful for many other applications as well. We draw a theoretical connection between multi-environment regression and causal models which allows to graphically characterize stable vs. unstable functional dependence on the response. Formally, we introduce the notion of a stable blanket which is a subset of the predictors that lies between the direct causal predictors and the Markov blanket. We prove that this set is optimal in the sense that a regression based on these predictors minimizes the mean squared prediction error, given that the resulting regression generalizes to unseen new environments.

MSC:

62F07 Statistical ranking and selection procedures
62J05 Linear regression; mixed models

Software:

KEGG; CausalKinetiX

References:

[1] Aldrich, J. (1989). Autonomy. Oxf. Econ. Pap. 41 15-34.
[2] Breiman, L. (1996). Bagging predictors. Mach. Learn. 24 123-140. · Zbl 0858.68080
[3] Breiman, L. (2001). Random forests. Mach. Learn. 45 5-32. · Zbl 1007.68152
[4] Bühlmann, P. (2020). Invariance, causality and robustness: 2018 Neyman Lecture. Statist. Sci. 35 404-426. · Zbl 07292527 · doi:10.1214/19-STS721
[5] Burnham, K. and Anderson, D. (1998). Practical use of the information-theoretic approach. In Model Selection and Inference 75-117. Springer. · Zbl 0920.62006
[6] Cannings, T. I. and Samworth, R. J. (2017). Random-projection ensemble classification. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 959-1035. With discussions and a reply by the authors. · Zbl 1373.62301 · doi:10.1111/rssb.12228
[7] Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica 28 591-605. · Zbl 0099.14304 · doi:10.2307/1910133
[8] Constantinou, P. and Dawid, A. P. (2017). Extended conditional independence and applications in causal inference. Ann. Statist. 45 2618-2653. · Zbl 1396.62002 · doi:10.1214/16-AOS1537
[9] Čuklina, J., Lee, C. H., Williams, E. G., Sajic, T., Collins, B. C., Rodriguez Martinez, M., Sharma, V. S., Wendt, F., Goetze, S., Keele, G. R. et al. (2021). Molecular systems biology. Batch effects in large-scale proteomics studies: diagnostics and correction.
[10] Dawid, A. P. (2002). Influence diagrams for causal modelling and inference. Int. Stat. Rev. 70 161-189. · Zbl 1215.62002
[11] Dutkowski, J., Kramer, M., Surma, M. A., Balakrishnan, R., Cherry, J. M., Krogan, N. J. and Ideker, T. (2013). A gene ontology inferred from molecular networks. Nat. Biotechnol. 31 38-45.
[12] Fabregat, A., Jupe, S., Matthews, L., Sidiropoulos, K., Gillespie, M., Garapati, P., Haw, R., Jassal, B., Korninger, F. et al. (2017). The reactome pathway knowledgebase. Nucleic Acids Res. 46(D1) D649-D655, 11. · doi:10.1093/nar/gkx1132
[13] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849-911. · Zbl 1411.62187 · doi:10.1111/j.1467-9868.2008.00674.x
[14] Francesconi, M., Remondini, D., Neretti, N., Sedivy, J. M., Cooper, L. N., Verondini, E., Milanesi, L. and Castellani, G. (2008). Reconstructing networks of pathways via significance analysis of their intersections. BMC Bioinform. 9 9.
[15] Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M. and Lempitsky, V. (2016). Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17 Paper No. 59, 35. · Zbl 1360.68671
[16] Haavelmo, T. (1944). The probability approach in econometrics. Econometrica 12 S 118. · Zbl 0063.01837 · doi:10.2307/1906935
[17] Heinze-Deml, C. and Meinshausen, N. (2021). Conditional variance penalties and domain shift robustness. Mach. Learn. 110 303-348. · Zbl 07432804 · doi:10.1007/s10994-020-05924-1
[18] Heinze-Deml, C., Peters, J. and Meinshausen, N. (2018). Invariant causal prediction for nonlinear models. J. Causal Inference 6.
[19] Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statist. Sci. 14 382-417. With comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors. · Zbl 1059.62525 · doi:10.1214/ss/1009212519
[20] Hoover, K. D. (1990). The logic of causal inference. Econ. Philos. 6 207-234.
[21] Imbens, G. W. and Rubin, D. B. (2015). Causal Inference—for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge Univ. Press, New York. · Zbl 1355.62002 · doi:10.1017/CBO9781139025751
[22] Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28 27-30. · doi:10.1093/nar/28.1.27
[23] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417-473. · Zbl 1411.62142 · doi:10.1111/j.1467-9868.2010.00740.x
[24] Pan, S., Tsang, I., Kwok, J. and Yang, Q. (2010). Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22 199-210.
[25] Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge. · Zbl 1188.68291 · doi:10.1017/CBO9780511803161
[26] Pellet, J.-P. and Elisseeff, A. (2008). Using Markov blankets for causal structure learning. J. Mach. Learn. Res. 9 1295-1342. · Zbl 1225.68205
[27] Perrone, M. and Cooper, L. (1992). When networks disagree: Ensemble methods for hybrid neural networks. Technical report, Brown Univ., Providence RI, Institute for Brain and Neural Systems.
[28] Peters, J., Bühlmann, P. and Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 947-1012. With comments and a rejoinder. · Zbl 1414.62297 · doi:10.1111/rssb.12167
[29] Peters, J., Janzing, D. and Schölkopf, B. (2017). Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. · Zbl 1416.62012
[30] Pfister, N., Bauer, S. and Peters, J. (2019). Learning stable and predictive structures in kinetic systems. Proc. Natl. Acad. Sci. USA 116 25405-25411. · Zbl 1456.70002 · doi:10.1073/pnas.1905688116
[31] Pfister, N., Bühlmann, P. and Peters, J. (2019). Invariant causal prediction for sequential data. J. Amer. Statist. Assoc. 114 1264-1276. · Zbl 1428.62415 · doi:10.1080/01621459.2018.1491403
[32] Pfister, N., Williams, E. G., Aebersold, R. and Bühlmann, P. (2021). Supplement to “Stabilizing variable selection and regression.” https://doi.org/10.1214/21-AOAS1487SUPPA, https://doi.org/10.1214/21-AOAS1487SUPPB
[33] Richardson, T. and Robins, J. M. (2013). Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, Univ. Washington Series. Working Paper 128, 30 April 2013.
[34] Rojas-Carulla, M., Schölkopf, B., Turner, R. and Peters, J. (2018). Invariant models for causal transfer learning. J. Mach. Learn. Res. 19 Paper No. 36, 34. · Zbl 1461.68196
[35] Rothenhäusler, D., Meinshausen, N., Bühlmann, P. and Peters, J. (2021). Anchor regression: Heterogeneous data meet causality. J. R. Stat. Soc. Ser. B. Stat. Methodol. 83 215-246. · Zbl 07555263 · doi:10.1111/rssb.12398
[36] Roumeliotis, T. I., Williams, S. P., Gonçalves, E., Alsinet, C., Del Castillo Velasco-Herrera, M., Aben, N., Ghavidel, F. Z., Michaut, M., Schubert, M. et al. (2017). Genomic determinants of protein abundance variation in colorectal cancer cells. Cell Rep. 20 2201-2214.
[37] Roy, S., Sleiman, M. B., Jha, P., Williams, E. G., Ingels, J. F., Chapman, C. J., McCarty, M. S., Hook, M., Sun, A. et al. (2019). Modulation of longevity by diet, and youthful body weight, but not by weight gain after maturity. Preprint bioRxiv:776559.
[38] Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K. and Mooij, J. M. (2012). On causal and anticausal learning. In Proceedings of the 29th International Conference on Machine Learning (ICML) 1255-1262. Omnipress. · Zbl 1330.68253
[39] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005
[40] Shah, R. D. and Bühlmann, P. (2018). Goodness-of-fit tests for high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 113-135. · Zbl 06840459 · doi:10.1111/rssb.12234
[41] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[42] Wang, S., Nan, B., Rosset, S. and Zhu, J. (2011). Random Lasso. Ann. Appl. Stat. 5 468-485. · Zbl 1220.62091 · doi:10.1214/10-AOAS377
[43] Williams, E. G., Pfister, N., Roy, S., Statzer, S., Ingels, J., Bohl, C., Hassan, M., Čuklina, J., Bühlmann, P. et al. (2020). Multi-omic profiling of the liver across diets and age in a diverse mouse population. Preprint bioRxiv. Available at https://www.biorxiv.org/content/10.1101/2020.08.20.222968v2.
[44] Wright, S. (1921). Correlation and causation. J. Agric. Res. 20 557-585.
[45] Yu, B. (2013). Stability. Bernoulli 19 1484-1500. · Zbl 1440.62402 · doi:10.3150/13-BEJSP14
[46] Yu, B. and Kumbier, K. (2020). Veridical data science. Proc. Natl. Acad. Sci. USA 117 3920-3929. · Zbl 1456.62321 · doi:10.1073/pnas.1901326117
[47] Zhang, K., Schölkopf, B., Muandet, K. and Wang, Z. (2013). Domain adaptation under target and conditional shift. In International Conference on Machine Learning (ICML) 819-827
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.