×

On-line predictive linear regression. (English) Zbl 1160.62065

Summary: We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. The standard treatment of prediction in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level \(\varepsilon \), but this property per se does not imply that the long-run frequency of error is close to \(\varepsilon \); (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters.
We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.

MSC:

62J05 Linear regression; mixed models
62G08 Nonparametric regression and quantile regression
62M20 Inference from stochastic processes and prediction
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI arXiv arXiv

References:

[1] Bell, C. B., Blackwell, D. and Breiman, L. (1960). On the completeness of order statistics. Ann. Math. Statist. 31 794-797. · Zbl 0101.12201
[2] Brown, L. D. (1990). An ancillarity paradox which appears in multiple linear regression (with discussion). Ann. Statist. 18 471-538. · Zbl 0721.62011
[3] Brown, R. L., Durbin, J. and Evans, J. M. (1975). Techniques for testing the constancy of regression relationships over time (with discussion). J. Roy. Statist. Soc. Ser. B 37 149-192. JSTOR: · Zbl 0321.62063
[4] Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction , Learning and Games . Cambridge Univ. Press, Cambridge. · Zbl 1114.91001
[5] Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics . Chapman & Hall, London. · Zbl 0334.62003
[6] Dawid, A. P. (1984). Statistical theory: The prequential approach. J. Roy. Statist. Soc. Ser. A 147 278-292. JSTOR: · Zbl 0557.62080
[7] Fisher, R. A. (1925). Applications of “Student’s” distribution. Metron 5 90-104. · JFM 51.0387.01
[8] Fisher, R. A. (1973). Statistical Methods and Scientific Inference , 3rd ed. Hafner, New York. · Zbl 0281.62002
[9] Fraser, D. A. S. (1957). Nonparametric Methods in Statistics . Wiley, New York. · Zbl 0077.12903
[10] Gammerman, A. and Vovk, V. (2007). Hedging predictions in machine learning (with discussion). Comput. J. 50 151-177.
[11] Lauritzen, S. L. (1988). Extremal Families and Systems of Sufficient Statistics. Lecture Notes in Statistics 49 . Springer, New York. · Zbl 0681.62009
[12] Lehmann, E. L. (1986). Testing Statistical Hypotheses , 2nd ed. Springer, New York. · Zbl 0608.62020
[13] Sampson, A. R. (1974). A tale of two regressions. J. Amer. Statist. Assoc. 69 682-689. JSTOR: · Zbl 0291.62081
[14] Seber, G. A. F. and Lee, A. J. (2003). Linear Regression Analysis , 2nd ed. Wiley, Hoboken, NJ. · Zbl 1029.62059
[15] Seillier-Moiseiwitsch, F. (1993). Sequential probability forecasts and the probability integral transform. Internat. Statist. Rev. 61 395-408. · Zbl 0827.62082
[16] Shafer, G. and Vovk, V. (2008). A tutorial on conformal prediction. J. Mach. Learn. Res. 9 371-421. · Zbl 1225.68215
[17] Takeuchi, K. (1975). Statistical Prediction Theory (in Japanese). Baihūkan, Tokyo. · Zbl 0333.62010
[18] Vanderlooy, S., van der Maaten, L. and Sprinkhuizen-Kuyper, I. (2007). Off-line learning with transductive confidence machines: An empirical evaluation. Lecture Notes in Artificial Intelligence 4571 310-323.
[19] Vovk, V., Gammerman, A. and Shafer, G. (2005). Algorithmic Learning in a Random World . Springer, New York. · Zbl 1105.68052
[20] Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. Ann. Math. Statist. 12 91-96. · Zbl 0024.42703
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.