×

Fisher and regression. (English) Zbl 1130.62300

Summary: In 1922 R. A. Fisher introduced the modern regression model, synthesizing the regression theory of Pearson and Yule and the least squares theory of Gauss. The innovation was based on Fisher’s realization that the distribution associated with the regression coefficient was unaffected by the distribution of \(X\). Subsequently Fisher interpreted the fixed \(X\) assumption in terms of his notion of ancillarity. This paper considers these developments against the background of the development of statistical theory in the early twentieth century.

MSC:

62-03 History of statistics
01A60 History of mathematics in the 20th century

Software:

BayesDA

References:

[1] Fisher’s published papers appear in J. H. Bennett, ed. (1971–1974). Collected Papers of R. A. Fisher , 5 vols. Adelaide Univ. Press. Bennett (? and nearly all of the papers referred to here are available from the University of Adelaide R. A. Fisher Digital Archive at http://www.library.adelaide.edu.au/ual/special/fisher.html.
[2] Aldrich, J. (1993). Cowles exogeneity and CORE exogeneity. Discussion Paper 9308, Dept. Economics, Southampton Univ.
[3] Aldrich, J. (1995). Correlations genuine and spurious in Pearson and Yule. Statist. Sci. 10 364–376.
[4] Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912–1922. Statist. Sci. 12 162–176. · Zbl 0955.62525 · doi:10.1214/ss/1030037906
[5] Aldrich, J. (1998). Doing least squares: Perspectives from Gauss and Yule. Internat. Statist. Rev. 66 61–81. · Zbl 0902.62076 · doi:10.1111/j.1751-5823.1998.tb00406.x
[6] Aldrich, J. (1999). Determinacy in the linear model: Gauss to Bose and Koopmans. Internat. Statist. Rev. 67 211–219. · Zbl 0934.62066 · doi:10.1111/j.1751-5823.1999.tb00427.x
[7] Aldrich, J. (2003–2005). A guide to R. A. Fisher. Available at http://www.economics.soton.ac.uk/staff/aldrich/fisherguide/rafreader.htm.
[8] Aldrich, J. (2003). The language of the English biometric school. Internat. Statist. Rev. 71 109–129. · Zbl 1114.62301 · doi:10.1111/j.1751-5823.2003.tb00188.x
[9] Aldrich, J. (2005). The statistical education of Harold Jeffreys. Internat. Statist. Rev. 73 289–308. · Zbl 1296.62011
[10] Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory . Wiley, Chichester. · Zbl 0387.62011
[11] Barndorff-Nielsen, O. E. and Cox, D. R. (1994). Inference and Asymptotics . Chapman and Hall, London. · Zbl 0826.62004
[12] Bartlett, M. S. (1933a). On the theory of statistical regression. Proc. Royal Soc. Edinburgh 53 260–283. · Zbl 0008.02402
[13] Bartlett, M. S. (1933b). Probability and chance in the theory of statistics. Proc. Roy. Soc. London Ser. A 141 518–534. · Zbl 0007.31402 · doi:10.1098/rspa.1933.0136
[14] Bartlett, M. S. (1936). Statistical information and properties of sufficiency. Proc. Roy. Soc. London Ser. A 154 124–137. · Zbl 0013.31305 · doi:10.1098/rspa.1936.0041
[15] Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. Roy. Soc. London Ser. A 160 268–282. · Zbl 0016.41201 · doi:10.1098/rspa.1937.0109
[16] Bartlett, M. S. (1940). A note on the interpretation of quasi-sufficiency. Biometrika 31 391–392. JSTOR: · Zbl 0063.00225 · doi:10.2307/2332618
[17] Bartlett, M. S. (1965). R. A. Fisher and the last fifty years of statistical methodology. J. Amer. Statist. Assoc. 60 395–409. JSTOR: · Zbl 0129.11004 · doi:10.2307/2282678
[18] Bartlett, M. S. (1981). Egon Sharpe Pearson, 1895–1980. Biometrika 68 1–7. JSTOR: · Zbl 0454.01020 · doi:10.2307/2335800
[19] Bartlett, M. S. (1982). Chance and change. In The Making of Statisticians (J. Gani, ed.) 42–60. Springer, New York. · Zbl 0489.62002
[20] Bennett, J. H., ed. (1990). Statistical Inference and Analysis : Selected Correspondence of R. A. Fisher . Oxford Univ. Press. · Zbl 0712.01007
[21] Berkson, J. (1950). Are there two regressions? J. Amer. Statist. Assoc. 45 164–180. · Zbl 0040.22404 · doi:10.2307/2280676
[22] Birnbaum, A. (1962). On the foundations of statistical inference. J. Amer. Statist. Assoc. 57 269–326. JSTOR: · Zbl 0107.36505 · doi:10.2307/2281640
[23] Bjerve, S. and Doksum, K. A. (1993). Correlation curves: Measures of association as functions of covariate values. Ann. Statist. 21 890–902. JSTOR: · Zbl 0817.62025 · doi:10.1214/aos/1176349156
[24] Blakeman, J. (1905). On tests for linearity of regression in frequency distributions. Biometrika 4 332–350. · JFM 36.0313.09
[25] Blyth, S. (1994). Karl Pearson and the correlation curve. Internat. Statist. Rev. 62 393–403. · Zbl 0828.62049 · doi:10.2307/1403769
[26] Bowley, A. L. (1901). Elements of Statistics . King, London. · JFM 32.0697.05
[27] Box, J. F. (1978). R. A. Fisher : The Life of a Scientist . Wiley, New York. · Zbl 0666.01016
[28] Brown, L. D. (1990). An ancillarity paradox which appears in multiple linear regression (with discussion). Ann. Statist. 18 471–538. JSTOR: · Zbl 0721.62011 · doi:10.1214/aos/1176347602
[29] Brunt, D. (1917). The Combination of Observations . Cambridge Univ. Press. · JFM 46.1495.05
[30] Campbell, N. (1924). The adjustment of observations. Philosophical Magazine ( 6 ) 47 816–826.
[31] Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357–372. · Zbl 0088.11702 · doi:10.1214/aoms/1177706618
[32] Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics . Chapman and Hall, London. · Zbl 0334.62003
[33] Cramér, H. (1946). Mathematical Methods of Statistics . Princeton Univ. Press, Princeton, NJ. · Zbl 0063.01014
[34] Edgeworth, F. Y. (1893). Exercises in the calculation of errors. Philosophical Magazine ( 5 ) 36 98–111. · JFM 25.0356.02
[35] Eisenhart, C. (1979). On the transition from ‘Student’s \(z\)’ to ‘Student’s \(t\).’ Amer. Statist. 33 6–10. JSTOR: · doi:10.2307/2683058
[36] Elderton, W. P. (1906). Frequency Curves and Correlation . Layton, London. · JFM 38.0289.12
[37] Ezekiel, M. (1930). Methods of Correlation Analysis . Wiley, London. · JFM 56.1092.04
[38] Farebrother, R. W. (1999). Fitting Linear Relationships : A History of the Calculus of Observations . Springer, New York. · Zbl 0934.62003
[39] Fienberg, S. E. (1980). Fisher’s contribution to the analysis of categorical data. R. A. Fisher : An Appreciation. Lecture Notes in Statist. 1 75–84. Springer, New York. · Zbl 0436.62002
[40] Fienberg, S. E. and Hinkley, D. V., eds. (1980). R. A. Fisher : An Appreciation. Lecture Notes in Statist. 1 . Springer, New York. · Zbl 0436.62002
[41] Fisher, R. A. (1912). On an absolute criterion for fitting frequency curves. Messenger of Mathematics 41 155–160. · JFM 43.0302.01
[42] Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10 507–521. · Zbl 0070.37304
[43] Fisher, R. A. (1921a). On the ‘probable error’ of a coefficient of correlation deduced from a small sample. Metron 1 3–32.
[44] Fisher, R. A. (1921b). Studies in crop variation. I. An examination of the yield of dressed grain from Broadbalk. J. Agricultural Science 11 107–135.
[45] Fisher, R. A. (1922a). The goodness of fit of regression formulae, and the distribution of regression coefficients. J. Roy. Statist. Soc. 85 597–612.
[46] Fisher, R. A. (1922b). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309–368. · JFM 48.1280.02
[47] Fisher, R. A. (1922c). On the interpretation of \(\chi^2\) from contingency tables, and the calculation of \(P\). J. Roy. Statist. Soc. 85 87–94.
[48] Fisher, R. A. (1924–1925). Note on Dr. Campbell’s alternative to the method of least squares. Unpublished manuscript, Barr Smith Library, Univ. Adelaide.
[49] Fisher, R. A. (1924–1928). On a distribution yielding the error functions of several well known statistics. In Proc. International Mathematical Congress 2 805–813. Univ. Toronto Press, Toronto. · JFM 54.0564.02
[50] Fisher, R. A. (1925a). Statistical Methods for Research Workers . Oliver and Boyd, Edinburgh. · JFM 51.0414.08
[51] Fisher, R. A. (1925b). Theory of statistical estimation. Proc. Cambridge Philos. Soc. 22 700–725. · JFM 51.0385.01
[52] Fisher, R. A. (1925c). Applications of ‘Student’s’ distribution. Metron 5 90–104. · JFM 51.0387.01
[53] Fisher, R. A. (1925d). The influence of rainfall on the yield of wheat at Rothamsted. Philos. Trans. Roy. Soc. London Ser. B 213 89–142.
[54] Fisher, R. A. (1934). Two new properties of mathematical likelihood. Proc. Roy. Soc. London Ser. A 144 285–307. · Zbl 0009.21902 · doi:10.1098/rspa.1934.0050
[55] Fisher, R. A. (1935). The logic of inductive inference (with discussion). J. Roy. Statist. Soc. 98 39–82. · Zbl 0011.03205
[56] Fisher, R. A. (1946). Testing the difference between two means of observations of unequal precision. Nature 158 713.
[57] Fisher, R. A. (1948). Conclusions fiduciaires. Ann. Inst. H. Poincaré10 191–213.
[58] Fisher, R. A. (1955). Statistical methods and scientific induction. J. Roy. Statist. Soc. Ser. B 17 69–78. JSTOR: · Zbl 0066.38008
[59] Fisher, R. A. (1956). Statistical Methods and Scientific Inference . Oliver and Boyd, Edinburgh. · Zbl 0070.36903
[60] Fisher, R. A. and Mackenzie, W. A. (1923). Studies in crop variation. II. The manurial response of different potato varieties. J. Agricultural Science 13 311–320.
[61] Fraser, D. A. S. (1992). Introduction to reprint of “Properties of sufficiency and statistical tests” [Bartlett (1937)]. In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 1 109–112. Springer, New York.
[62] Fraser, D. A. S. (2004). Ancillaries and conditional inference (with discussion). Statist. Sci . 19 333–369. · Zbl 1100.62534 · doi:10.1214/088342304000000323
[63] Galton, F. (1877). Typical laws of heredity. Nature 15 492–495, 512–514, 532–533.
[64] Galton, F. (1886). Family likeness in stature. Proc. Roy. Soc. London 40 42–73. · JFM 18.0175.04
[65] Gauss, C. F. (1809/1963). Theoria Motus Corporum Coelestium (C. H. Davis, transl.). Dover, New York, reprinted 1963.
[66] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (1995). Bayesian Data Analysis . Chapman and Hall, London. · Zbl 1279.62004
[67] Hald, A. (1998). A History of Mathematical Statistics from 1750 to 1930 . Wiley, New York. · Zbl 0979.01012
[68] Hald, A. (1999). On the history of maximum likelihood in relation to inverse probability and least squares. Statist. Sci . 14 214–222. · Zbl 1059.62502 · doi:10.1214/ss/1009212248
[69] Hinkley, D. V. (1980a). Theory of statistical estimation: The 1925 paper. R. A. Fisher : An Appreciation. Lecture Notes in Statist. 1 85–94. Springer, New York.
[70] Hinkley, D. V. (1980b). Fisher’s development of conditional inference. R. A. Fisher : An Appreciation. Lecture Notes in Statist. 1 101–108. Springer, New York.
[71] Hooker, R. H. (1907). Correlation of the weather and crops. J. Roy. Statist. Soc. 70 1–51.
[72] Hotelling, H. (1940). The selection of variates for use in prediction with some comments on the general problem of nuisance parameters. Ann. Math. Statist. 11 271–283. · Zbl 0023.34206 · doi:10.1214/aoms/1177731867
[73] Hotelling, H. (1948). Review of The Advanced Theory of Statistics 2 , by M. G. Kendall. Bull. Amer. Math. Soc. 54 863–868. · doi:10.1090/S0002-9904-1948-09072-3
[74] Howie, D. (2002). Interpreting Probability : Controversies and Developments in the Early Twentieth Century . Cambridge Univ. Press. · Zbl 1031.01012
[75] Kalbfleisch, J. (1982). Ancillary statistics. Encyclopedia of Statistical Sciences 1 77–81. Wiley, New York.
[76] Kendall, M. G. (1946). The Advanced Theory of Statistics 2 . Griffin, London. · Zbl 0063.03217
[77] Kendall, M. G. (1951). Regression, structure and functional relationship. I. Biometrika 38 11–25. JSTOR: · Zbl 0045.41202 · doi:10.1093/biomet/38.1-2.11
[78] KoÅ\?odziejczyk, S. (1935). On an important class of statistical hypotheses. Biometrika 27 161–190. · Zbl 0011.22002 · doi:10.2307/2332043
[79] Koopmans, T. C. (1937). Linear Regression Analysis of Economic Time Series . Bohn, Haarlem, Netherlands. · JFM 63.1127.04
[80] Lancaster, H. O. (1969). The Chi-Squared Distribution . Wiley, New York. · Zbl 0193.17802
[81] Lehmann, E. L. (1999). ‘Student’ and small-sample theory. Statist. Sci. 14 418–426. · Zbl 1059.62503 · doi:10.1214/ss/1009212520
[82] McMullen, L. (1970). Letters from W. S. Gosset to R. A. Fisher 1915–1936 : Summaries by R. A. Fisher with a Foreword by L. McMullen , 2nd ed. Printed by Arthur Guinness for private circulation and placed in a few libraries.
[83] Merriman, M. (1884/1911). A Textbook on the Method of Least Squares . Wiley, New York. References are to the eighth edition, 1911. · JFM 16.0182.01
[84] Miller, J., ed. (1999–2005). Earliest uses of symbols in probability and statistics. Available at http://members.aol.com/jeff570/stat.html.
[85] Morgan, M. S. (1990). The History of Econometric Ideas . Cambridge Univ. Press. · Zbl 0765.01005
[86] Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection (with discussion). J. Roy. Statist. Soc. 97 558–625. · Zbl 0010.07201
[87] Neyman, J. and Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference. I, II. Biometrika 20A 175–240, 263–294. · JFM 54.0565.05
[88] Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London Ser. A 231 289–337. · Zbl 0006.26804 · doi:10.1098/rsta.1933.0009
[89] Olkin, I. (1989). A conversation with Maurice Bartlett. Statist. Sci. 4 151–163. JSTOR: · Zbl 0955.01521 · doi:10.1214/ss/1177012600
[90] Pearson, E. S. (1926). Review of Statistical Methods for Research Workers , by R. A. Fisher. Science Progress 20 733–734.
[91] Pearson, E. S. (1990). ‘ Student ’, A Statistical Biography of William Sealy Gosset (R. L. Plackett, ed.; G. A. Barnard, assist.). Oxford Univ. Press. · Zbl 0711.01030
[92] Pearson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philos. Trans. Roy. Soc. London Ser. A 186 343–414. · JFM 26.0243.03
[93] Pearson, K. (1896). Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia. Philos. Trans. Roy. Soc. London Ser. A 187 253–318. · JFM 27.0185.01
[94] Pearson, K. (1899). Mathematical contributions to the theory of evolution. V. On the reconstruction of the stature of prehistoric races. Philos. Trans. Roy. Soc. London Ser. A 192 169–244. · JFM 30.0222.03
[95] Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine ( 5 ) 50 157–175. · JFM 31.0238.04
[96] Pearson, K. (1902a). On the systematic fitting of curves to observations and measurements. I, II. Biometrika 1 265–303, 2 1–23.
[97] Pearson, K. (1902b). On the mathematical theory of errors of judgment, with special reference to the personal equation. Philos. Trans. Roy. Soc. London Ser. A 198 235–299. · JFM 33.0242.03
[98] Pearson, K. (1905). On the general theory of skew correlation and non-linear regression. Drapers ’ Company Research Memoirs, Biometric Series II . Cambridge Univ. Press.
[99] Pearson, K., ed. (1914). Biometrika Tables for Statisticians and Biometricians . Cambridge Univ. Press.
[100] Pearson, K. (1916). On the application of ‘goodness of fit’ tables to test regression curves and theoretical curves used to describe observational or experimental data. Biometrika 11 239–261.
[101] Pearson, K. (1920). Notes on the history of correlation. Biometrika 13 25–45.
[102] Pearson, K. (1923). Notes on skew frequency surfaces. Biometrika 15 222–230.
[103] Pearson, K. (1925). Further contributions to the theory of small samples. Biometrika 17 176–200. · JFM 51.0416.10
[104] Pearson, K. (1926). Researches on the mode of distribution of the constants of samples taken at random from a bivariate normal population. Proc. Roy. Soc. London Ser. A 112 1–14. · JFM 52.0531.04
[105] Pearson, K., ed. (1931). Tables for Statisticians and Biometricians , Part II . Cambridge Univ. Press. · JFM 57.0694.02
[106] Pearson, K., ed. (1934). Tables of the Incomplete Beta-Function . Cambridge Univ. Press. · Zbl 0008.30403
[107] Pearson, K. (1935). Thoughts suggested by the papers of Messrs. Welch and KoÅ\?odziejczyk. Biometrika 27 227–259. · Zbl 0011.22003
[108] Pearson, K. and Filon, L. N. G. (1898). Mathematical contributions to the theory of evolution. IV. On the probable errors of frequency constants and on the influence of random selection on variation and correlation. Philos. Trans. Roy. Soc. London Ser. A 191 229–311. · JFM 29.0192.01
[109] Reid, N. (1994). A conversation with Sir David Cox. Statist. Sci. 9 439–455. JSTOR: · Zbl 0955.01543 · doi:10.1214/ss/1177010394
[110] Reid, N. (1995). The roles of conditioning in inference (with discussion). Statist. Sci. 10 138–157, 173–196. · Zbl 0955.62524 · doi:10.1214/ss/1177010027
[111] Sampson, A. R. (1974). A tale of two regressions. J. Amer. Statist. Assoc. 69 682–689. JSTOR: · Zbl 0291.62081 · doi:10.2307/2286002
[112] Savage, L. J. (1962). Subjective probability and statistical practice. In The Foundations of Statistical Inference : A Discussion (L. J. Savage et al., eds.) 9–35. Methuen, London.
[113] Schultz, H. (1929). Applications of the theory of error to the interpretation of trends: Discussion. J. Amer. Statist. Assoc. Suppl. 24 86–89. · JFM 55.0928.04
[114] Seal, H. (1967). The historical development of the Gauss linear model. Biometrika 54 1–24. JSTOR: · Zbl 0154.25103 · doi:10.2307/2333849
[115] Seneta, E. (1988). Slutsky (Slutskii), Evgenii Evgenievich. Encyclopedia of Statistical Sciences 8 512–515. Wiley, New York.
[116] Slutsky, E. E. (1913). On the criterion of goodness of fit of the regression lines and on the best method of fitting them to the data. J. Roy. Statist. Soc. 77 78–84.
[117] Stigler, S. M. (1986). The History of Statistics . The Measurement of Uncertainty before 1900 . Belknap, Cambridge, MA. · Zbl 0656.62005
[118] Stigler, S. M. (2001). Ancillary history. In State of the Art in Probability and Statistics : Festschrift for Willem R. van Zwet (M. deGunst, C. Klaassen and A. van der Vaart, eds.) 555–567. IMS, Beachwood, OH. · Zbl 1373.62013 · doi:10.1214/lnms/1215090089
[119] Student (1908a). The probable error of a mean. Biometrika 6 1–25.
[120] Student (1908b). Probable error of a correlation coefficient. Biometrika 6 302–310.
[121] Student (1926). Review of Statistical Methods for Research Workers , by R. A. Fisher. Eugenics Review 18 148–150.
[122] Tolley, H. R. and Ezekiel, M. J. B. (1923). A method of handling multiple correlation problems. J. Amer. Statist. Assoc. 18 993–1003.
[123] Welch, B. L. (1935). Some problems in the analysis of regression among \(k\) samples of two variables. Biometrika 27 145–160. · Zbl 0011.22001 · doi:10.1093/biomet/27.1-2.145
[124] Welch, B. L. (1939). On confidence limits and sufficiency, with particular reference to parameters of location. Ann. Math. Statist. 10 58–69. · Zbl 0020.38202 · doi:10.1214/aoms/1177732246
[125] Working, H. and Hotelling, H. (1929). Applications of the theory of error to the interpretation of trends. J. Amer. Statist. Assoc. Suppl. 24 73–85. · JFM 55.0928.04
[126] Yule, G. U. (1897). On the theory of correlation. J. Roy. Statist. Soc. 60 812–854. · JFM 28.0211.02
[127] Yule, G. U. (1899). An investigation into the causes of changes in pauperism in England, chiefly during the last two intercensal decades (part I). J. Roy. Statist. Soc. 62 249–295.
[128] Yule, G. U. (1907). On the theory of correlation for any number of variables, treated by a new system of notation. Proc. Roy. Soc. London Ser. A 79 182–193. · JFM 38.0285.02
[129] Yule, G. U. (1909). The applications of the method of correlation to social and economic statistics. J. Roy. Statist. Soc. 72 721–730.
[130] Yule, G. U. (1911). An Introduction to the Theory of Statistics . Griffin, London. · JFM 42.0263.10
[131] Zabell, S. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369–387. JSTOR: · Zbl 0955.62521 · doi:10.1214/ss/1177011233
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.