×

What is a statistical model? (With comments and rejoinder). (English) Zbl 1039.62003

Summary: This paper addresses two closely related questions, “What is a statistical model?” and “What is a parameter?” The notions that a model must “make sense”, and that a parameter must “have a well-defined meaning” are deeply ingrained in applied statistical work, reasonably well understood at an instinctive level, but absent from most formal theories of modelling and inference. In this paper, these concepts are defined in algebraic terms, using morphisms, functors and natural transformations.
It is argued that inference on the basis of a model is not possible unless the model admits a natural extension that includes the domain for which inference is required. For example, prediction requires that the domain includes all future units, subjects or time points. Although it is usually not made explicit, every sensible statistical model admits such an extension.
Examples are given to show why such an extension is necessary and why a formal theory is required. In the definition of a subparameter, it is shown that certain parameter functions are natural and others are not. Inference is meaningful only for natural parameters. This distinction has important consequences for the construction of prior distributions and also helps to resolve a controversy concerning the Box-Cox model.

MSC:

62A01 Foundations and philosophical topics in statistics
62F99 Parametric inference
Full Text: DOI

References:

[1] ALDOUS, D. (1981). Representations for partially exchangeable array s of random variables. J. Multivariate Analy sis 11 581-598. · Zbl 0474.60044 · doi:10.1016/0047-259X(81)90099-3
[2] ANDREWS, D. F. and HERZBERG, A. (1985). Data. Springer, New York. · Zbl 0567.62002
[3] BARNDORFF-NIELSEN, O. E. and COX, D. R. (1994). Inference and Asy mptotics. Chapman and Hall, London. · Zbl 0826.62004
[4] BARTLETT, M. S. (1978). Nearest neighbour models in the analysis of field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 40 147-174. JSTOR: · Zbl 0396.62051
[5] BERGER, J. O. (1985). Statistical Decision Theory and Bayesian Analy sis, 2nd ed. Springer, New York. · Zbl 0572.62008
[6] BERNARDO, J. M. and SMITH, A. F. M. (1994). Bayesian Theory. Wiley, New York. · Zbl 0796.62002
[7] BESAG, J. (1974). Spatial interaction and the statistical analysis of lattice sy stems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192-236. JSTOR: · Zbl 0327.60067
[8] BESAG, J. and HIGDON, D. (1999). Bayesian analysis of agricultural field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 61 691-746. JSTOR: · Zbl 0951.62091 · doi:10.1111/1467-9868.00201
[9] BESAG, J. and KOOPERBERG, C. (1995). On conditional and intrinsic autoregressions. Biometrika 82 733-746. JSTOR: · Zbl 0899.62123
[10] BEST, N. G., ICKSTADT, K. and WOLPERT, R. L. (1999). Contribution to the discussion of Besag
[11] . J. Roy. Statist. Soc. Ser. B 61 728-729.
[12] BICKEL, P. and DOKSUM, K. A. (1981). An analysis of transformations revisited. J. Amer. Statist. Assoc. 76 296-311. JSTOR: · Zbl 0464.62058 · doi:10.2307/2287831
[13] BILLINGSLEY, P. (1986). Probability and Measure, 2nd ed. Wiley, New York. · Zbl 0649.60001
[14] BOX, G. E. P. and COX, D. R. (1964). An analysis of transformations (with discussion). J. Roy. Statist. Soc. Ser. B 26 211-252. JSTOR: · Zbl 0156.40104
[15] BOX, G. E. P. and COX, D. R. (1982). An analysis of transformations revisited, rebutted. J. Amer. Statist. Assoc. 77 209-210. JSTOR: · Zbl 0504.62058 · doi:10.2307/2287791
[16] COX, D. R. (1958). Planning of Experiments. Wiley, New York. · Zbl 0084.15802
[17] COX, D. R. (1986). Comment on Holland (1986). J. Amer. Statist. Assoc. 81 963-964.
[18] COX, D. R. and HINKLEY, D. V. (1974). Theoretical Statistics. Chapman and Hall, London. · Zbl 0334.62003
[19] COX, D. R. and REID, N. (1987). Parameter orthogonality and approximate conditional inference (with discussion). J. Roy. Statist. Soc. Ser. B 49 1-39. JSTOR: · Zbl 0616.62006
[20] COX, D. R. and SNELL, E. J. (1981). Applied Statistics. Chapman and Hall, London. · Zbl 0612.62002
[21] COX, D. R. and WERMUTH, N. (1996). Multivariate Dependencies. Chapman and Hall, London. · Zbl 0880.62124
[22] DALE, J. R. (1984). Local versus global association for bivariate ordered responses. Biometrika 71 507-514. JSTOR: · doi:10.1093/biomet/71.3.507
[23] DE FINETTI, B. (1975). Theory of Probability 2. Wiley, New York. · Zbl 0328.60003
[24] GELMAN, A., CARLIN, J. B., STERN, H. and RUBIN, D. B. (1995). Bayesian Data Analy sis. Chapman and Hall, London.
[25] GOODMAN, L. A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. J. Amer. Statist. Assoc. 74 537-552. · doi:10.2307/2286971
[26] GOODMAN, L. A. (1981). Association models and canonical correlation in the analysis of crossclassifications having ordered categories. J. Amer. Statist. Assoc. 76 320-334. · doi:10.2307/2287833
[27] HAMADA, M. and WU, C. F. J. (1992). Analy sis of designed experiments with complex aliasing. J. Qual. Technology 24 130-137.
[28] HARVILLE, D. A. and ZIMMERMANN, D. L. (1999). Contribution to the discussion of Besag (1999). J. Roy. Statist. Soc. Ser. B 61 733-734.
[29] HELLAND, I. S. (1999a). Quantum mechanics from sy mmetry and statistical modelling. Internat. J. Theoret. Phy s. 38 1851-1881. · Zbl 0953.81003 · doi:10.1023/A:1026676913271
[30] HELLAND, I. S. (1999b). Quantum theory from sy mmetries in a general statistical parameter space. Technical report, Dept. Mathematics, Univ. Oslo.
[31] HINKLEY, D. V. and RUNGER, G. (1984). The analysis of transformed data (with discussion). J. Amer. Statist. Assoc. 79 302-320. JSTOR: · Zbl 0553.62051 · doi:10.2307/2288264
[32] HOLLAND, P. (1986). Statistics and causal inference (with discussion). J. Amer. Statist. Assoc. 81 945-970. JSTOR: · Zbl 0607.62001 · doi:10.2307/2289064
[33] HORA, R. B. and BUEHLER, R. J. (1966). Fiducial theory and invariant estimation. Ann. Math. Statist. 37 643-656. · Zbl 0148.13805 · doi:10.1214/aoms/1177699458
[34] KINGMAN, J. F. C. (1984). Present position and potential developments: Some personal views. Probability and random processes. J. Roy. Statist. Soc. Ser. A 147 233-244. · Zbl 0547.60003 · doi:10.2307/2981679
[35] KINGMAN, J. F. C. (1993). Poisson Processes. Oxford Univ. Press. · Zbl 0771.60001
[36] LAURITZEN, S. (1988). Extremal Families and Sy stems of Sufficient Statistics. Lecture Notes in Statist. 49. Springer, New York. · Zbl 0681.62009
[37] LEHMANN, E. L. (1983). Theory of Point Estimation. Wiley, New York. · Zbl 0522.62020
[38] LEHMANN, E. L. and CASELLA, G. (1998). Theory of Point Estimation, 2nd ed. Springer, New York. · Zbl 0916.62017
[39] LITTELL, R., FREUND, R. J. and SPECTOR, P. C. (1991). SAS Sy stem for Linear Models, 3rd ed. SAS Institute, Cary, NC.
[40] MAC LANE, S. (1998). Categories for the Working Mathematician, 2nd ed. Springer, New York. · Zbl 0906.18001
[41] MCCULLAGH, P. (1980). Regression models for ordinal data (with discussion). J. Roy. Statist. Soc. Ser. B 42 109-142. JSTOR: · Zbl 0483.62056
[42] MCCULLAGH, P. (1992). Conditional inference and Cauchy models. Biometrika 79 247-259. JSTOR: · Zbl 0753.62002 · doi:10.1093/biomet/79.2.247
[43] MCCULLAGH, P. (1996). Möbius transformation and Cauchy parameter estimation. Ann. Statist. 24 787-808. · Zbl 0859.62007 · doi:10.1214/aos/1032894465
[44] MCCULLAGH, P. (1999). Quotient spaces and statistical models. Canad. J. Statist. 27 447-456. JSTOR: · Zbl 0946.62005 · doi:10.2307/3316103
[45] MCCULLAGH, P. (2000). Invariance and factorial models (with discussion). J. Roy. Statist. Soc. Ser. B 62 209-256. JSTOR: · doi:10.1111/1467-9868.00229
[46] MCCULLAGH, P. and NELDER, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London. · Zbl 0744.62098
[47] MCCULLAGH. P. and WIT, E. (2000). Natural transformation and the Bay es map. Technical report.
[48] MERCER, W. B. and HALL, A. D. (1911). The experimental error of field trials. J. Agric. Research 50 331-357.
[49] NELDER, J. A. (1977). A re-formulation of linear models (with discussion). J. Roy. Statist. Soc. Ser. A 140 48-77. JSTOR: · doi:10.2307/2344517
[50] PEARSON, K. (1913). Note on the surface of constant association. Biometrika 9 534-537.
[51] PLACKETT, R. L. (1965). A class of bivariate distributions. J. Amer. Statist. Assoc. 60 516-522. JSTOR: · doi:10.2307/2282685
[52] RUBIN, D. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34-58. · Zbl 0383.62021 · doi:10.1214/aos/1176344064
[53] RUBIN, D. (1986). Comment on Holland (1986). J. Amer. Statist. Assoc. 81 961-962.
[54] SMITH, A. F. M. (1984). Present position and potential developments: some personal views. Bayesian statistics. J. Roy. Statist. Soc. Ser. A 147 245-259. · Zbl 0555.65100 · doi:10.2307/2981672
[55] TJUR, T. (2000). Contribution to the discussion of McCullagh (2000). J. Roy. Statist. Soc. Ser. B 62 238-239.
[56] WHITTLE, P. (1974). Contribution to the discussion of Besag (1974). J. Roy. Statist. Soc. Ser. B 36 228. JSTOR:
[57] YANDELL, B. S. (1997). Practical Data Analy sis for Designed Experiments. Chapman and Hall, London. · Zbl 1056.62500
[58] CHICAGO, ILLINOIS 60637-1514 E-MAIL: pmcc@galton.uchicago.edu berg (1995), Besag and Higdon (1999) and Rue and Tjelmeland (2002). However, spatial effects are often of secondary importance, as in variety trials, and the main intention is to absorb an appropriate level of spatial variation in the formulation, rather than produce a spatial model with scientifically interpretable parameters. Nevertheless, McCullagh’s basic point is well taken. For example, I view the use of MRFs in geographical epidemiology [e.g., Besag, York and Mollié (1991)] as mainly of exploratory value, in suggesting additional spatially related covariates whose inclusion would ideally dispense with the need for a spatial formulation;
[59] uniformity trials in Fairfield Smith (1938) and Pearce (1976). Of course, in a genuine variety trial, one might want to predict what the aggregate yield over the entire field would have been for a few individual varieties but this does not require any extension of the formulation to McCullagh’s conceptual plots. Indeed, such calculations are especially well suited to the Bayesian paradigm, both theoretically, because one is supposed to deal with potentially observable quantities rather than merely with parameters, and in practice, via MCMC, because the posterior predictive distributions are available rigorously. That is, for the aggregate yield of variety A, one uses the observed yields on plots that were sown with A and generates a set of observations from the likelihood for those that were not for each MCMC sample of parameter values, hence building a corresponding distribution of total yield. One may also construct credible intervals for the difference in total yields between varieties A and B and easily address all manner of questions in ranking and selection that simply cannot be considered in a frequentist framework; for example, the posterior probability that the total yield obtained by sowing any particular variety (perhaps chosen in the light of the experiment) would have been at least 10
[60] ton (1986). The findings ty pically suggest that the gains from spatial analysis in a badly designed experiment provide improvements commensurate with standard analysis and optimal design. This is not a reason to adopt poor designs but the simple fact is that, despite the efforts of statisticians, many experiments are carried out using nothing better than randomized complete blocks. It is highly desirable that the representation of fertility is flexible but is also parsimonious because there are many variety effects to be estimated, with very limited replication. McCullagh’s use of discrete approximations to harmonic functions in Section 8 fails on both counts: first, local maxima or minima cannot exist except (artificially) at plots on the edge of the trial; second, the degrees of freedom lost in the fit equals the number of such plots and is therefore substantial (in fact, four less in a rectangular lay out because the corner plots are ignored throughout the analysis!). Nevertheless, there is something appealing about the averaging property of harmonic functions, if only it were a little more flexible. What is required is a random effects (in frequentist terms) version and that is precisely the thinking behind the use of intrinsic autoregressions in BH and elsewhere. Indeed, such schemes fit McCullagh’s discretized harmonic functions perfectly, except for edge effects (because BH embeds the array in a larger one to cater for such effects), and they also provide a good fit to more plausible fertility functions. For specific comments on the Mercer and Hall data, see below. Of course, spatial scale remains an important issue for variety trials and indeed is discussed empirically in Section 2.3 and in the rejoinder of BH. For onedimensional adjustment, the simplest plausible continuum process is Brownian motion with an arbitrary level, for which the necessary integrations can be ATKINSON, A. C. and BAILEY, R. A. (2001). One hundred years of the design of experiments on and off the pages of Biometrika. Biometrika 88 53-97. JSTOR: · Zbl 1037.62069 · doi:10.1093/biomet/88.1.53
[61] BESAG, J. E. (1974). Spatial interaction and the statistical analysis of lattice sy stems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192-236. JSTOR: · Zbl 0327.60067
[62] BESAG, J. E. (1975). Statistical analysis of non-lattice data. The Statistician 24 179-195.
[63] BESAG, J. E., GREEN, P. J., HIGDON, D. M. and MENGERSEN, K. L. (1995). Bayesian computation and stochastic sy stems (with discussion). Statist. Sci. 10 3-66. · Zbl 0955.62552 · doi:10.1214/ss/1177010123
[64] BESAG, J. E. and HIGDON, D. M. (1993). Bayesian inference for agricultural field experiments. Bull. Internat. Statist. Inst. 55 121-136.
[65] BESAG, J. E. and HIGDON, D. M. (1999). Bayesian analysis of agricultural field experiments (with discussion). J. Roy. Statist. Soc. Ser. B 61 691-746. JSTOR: · Zbl 0951.62091 · doi:10.1111/1467-9868.00201
[66] BESAG, J. E. and KEMPTON, R. A. (1986). Statistical analysis of field experiments using neighbouring plots. Biometrics 42 231-251. · Zbl 0658.62129 · doi:10.2307/2531047
[67] BESAG, J. E. and KOOPERBERG, C. L. (1995). On conditional and intrinsic autoregressions. Biometrika 82 733-746. JSTOR: · Zbl 0899.62123
[68] BESAG, J. E., YORK, J. C. and MOLLIÉ, A. (1991). Bayesian image restoration, with two applications in spatial statistics (with discussion). Ann. Inst. Statist. Math. 43 1-59. · Zbl 0760.62029 · doi:10.1007/BF00116466
[69] BREIMAN, L. (2001). Statistical modeling: the two cultures (with discussion). Statist. Sci. 16 199- 231. · Zbl 1059.62505 · doi:10.1214/ss/1009213726
[70] By ERS, S. D. and BESAG, J. E. (2000). Inference on a collapsed margin in disease mapping. Statistics in Medicine 19 2243-2249.
[71] FAIRFIELD SMITH, H. (1938). An empirical law describing heterogeneity in the yields of agricultural crops. J. Agric. Sci. 28 1-23.
[72] FISHER, R. A. (1922). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309-368. · JFM 48.1280.02
[73] FISHER, R. A. (1928). Statistical Methods for Research Workers, 2nd ed. Oliver and Boy d, Edinburgh. · JFM 54.0573.07
[74] GILMOUR, A. R., CULLIS, B. R., SMITH, A. B. and VERBy LA, A. P. (1999). Discussion of paper by J. E. Besag and D. M. Higdon. J. Roy. Statist. Soc. B 61 731-732.
[75] HEINE, V. (1955). Models for two-dimensional stationary stochastic processes. Biometrika 42 170- 178. JSTOR: · Zbl 0067.36504 · doi:10.1093/biomet/42.1-2.170
[76] KÜNSCH, H. R. (1987). Intrinsic autoregressions and related models on the two-dimensional lattice. Biometrika 74 517-524. · Zbl 0671.62082 · doi:10.1093/biomet/74.3.517
[77] MATÉRN, B. (1986). Spatial Variation. Springer, New York. · Zbl 0608.62122
[78] MCBRATNEY, A. B. and WEBSTER, R. (1981). Detection of ridge and furrow pattern by spectral analysis of crop yield. Internat. Statist. Rev. 49 45-52.
[79] PEARCE, S. C. (1976). An examination of Fairfield Smith’s law of environmental variation. J. Agric. Sci. 87 21-24.
[80] RUE, H. and TJELMELAND, H. (2002). Fitting Gaussian Markov random fields to Gaussian fields. Scand. J. Statist. 29 31-49. · Zbl 1017.62088 · doi:10.1111/1467-9469.00058
[81] WHITTLE, P. (1962). Topographic correlation, power-law covariance functions, and diffusion. Biometrika 49 305-314. JSTOR: · Zbl 0114.08003 · doi:10.1093/biomet/49.3-4.305
[82] SEATTLE, WASHINGTON 98195-4322 E-MAIL: julian@stat.washington.edu recently by Chen, Lockhart and Stephens (2002). One reason for its attractiveness to me is that if one considers the more realistic semiparametric model, a(Y) = X +, (6) where a is an arbitrary monotone transformation and has a N (\mu , 2) distribution then / is identifiable and estimable at the n-1/2 rate while is not identifiable. Bickel and Ritov (1997) discuss way s of estimating / and a which is also estimable at rate n-1/2 optimally and suggest approaches to algorithms in their paper. The choice (,) is of interest to me because its consideration is the appropriate response to the Hinkley-Runger critique. One needs to specify a joint confidence region for (,) making statements such as ”the effect magnitude on the scale is consistent with the data.” The effect of lack of knowledge of on the variance of remains interpretable. It would be more attractive if McCullagh could somehow divorce the calculus of this paper from the language of functors, morphisms and canonical diagrams for more analysis-oriented statisticians such as my self.
[83] BICKEL, P. and RITOV, Y. (1997). Local asy mptotic normality of ranks and covariates in the transformation models. In Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics (D. Pollard, E. Torgersen and G. Yang, eds.) 43-54. Springer, New York. · Zbl 0897.62017
[84] CHEN, G., LOCKHART, R. A. and STEPHENS, M. A. (2002). Box-Cox tranformations in linear models: Large sample theory and tests of normality (with discussion). Canad. J. Statist. 30 177-234. JSTOR: · Zbl 1012.62072 · doi:10.2307/3315946
[85] BERKELEY, CALIFORNIA 94720-3860 E-MAIL: bickel@stat.berkeley.edu MAC LANE, S. (1998). Categories for the Working Mathematician, 2nd ed. Springer, New York.
[86] FRASER, D. A. S. (1968a). A black box or a comprehensive model. Technometrics 10 219-229. JSTOR: · doi:10.2307/1267040
[87] FRASER, D. A. S. (1968b). The Structure of Inference. Wiley, New York. · Zbl 0164.48703
[88] MCCULLAGH, P. (1992). Conditional inference and Cauchy models. Biometrika 79 247-259. JSTOR: · Zbl 0753.62002 · doi:10.1093/biomet/79.2.247
[89] TORONTO, ONTARIO M5S 3G3 CANADA E-MAIL: reid@utstat.utoronto.ca from Helland (2002). Let a group G be defined on the parameter space of a model. A measurable function from to another space is called a natural subparameter if ( 1) = ( 2) implies (g 1) = (g 2) for all g G. For example, in the location and scale case the location parameter \mu and the scale parameter are natural, while the coefficient of variation \mu / is not natural (it is if the group is changed to the pure scale group). In general the parameter is natural iff the level sets of the function = () are transformed onto other inconsistency discussed in detail by Dawid, Stone and Zidek (1973). Their main problem is a violation of the plausible reduction principle: assume that a general method of inference, applied to data (y, z), leads to an answer that in fact depends on z alone. Then the same answer should appear if the same method is applied to z alone. A Bayesian implementation of this principle runs as follows: assume first that the probability density p(y, z |,) depends on the parameter = (,) in such a way that the marginal density p(z |) only depends upon. Then the following implication should hold: if (a) the marginal posterior density ( | y, z) depends on the data (y, z) only through z, then (b) this ( | z) should be proportional to a()p(z |) for some function a(), so that it is proportional to a posterior based solely on the z data. For a proper prior (,) this can be shown to hold with a() being the appropriate marginal prior (). Dawid, Stone and Zidek (1973) gave several examples where the implication above is violated by improper priors of the kind that we sometimes expect to have in objective Bay es inference. For our purpose, the interesting case is when there is a transformation group G defined on the parameter space. Under the assumption that is maximal invariant under G and making some regularity conditions, it is then first shown by Dawid, Stone and Zidek (1973) that it necessarily follows that p(z |,) only depends upon, next (a) is shown to hold alway s, and finally (b) holds if and only if the prior is of the form G(d) d, where G is right Haar measure, and the measure DAWID, A. P., STONE, M. and ZIDEK, J. V. (1973). Marginalization paradoxes in Bayesian and structural inference (with discussion). J. Roy. Statist. Soc. Ser. B 35 189-233. JSTOR:
[90] HELLAND, I. S. (2001). Reduction of regression models under sy mmetry. In Algebraic Methods in Statistics and Probability (M. Viana and D. Richards, eds.) 139-153. Amer. Math. Soc., Providence, RI. · Zbl 1012.62077
[91] HELLAND, I. S. (2002). Statistical inference under a fixed sy mmetry group. Available at http:// www.math.uio.no/ ingeh/. URL:
[92] BROWN, L. D. (1984). The research of Jack Kiefer outside the area of experimental design. Ann. Statist. 12 406-415. · Zbl 0549.01017 · doi:10.1214/aos/1176346495
[93] CARTIER, P. (2001). A mad day’s work: From Grothendieck to Connes and Kontsevich. The evolution of concepts of space and sy mmetry. Bull. Amer. Math. Soc. 38 389-408. · Zbl 0985.01005 · doi:10.1090/S0273-0979-01-00913-2
[94] GROTHENDIECK, A. (1955). Produits tensoriels topologiques et espaces nucléaires. Mem. Amer. Math. Soc. 16. · Zbl 0123.30301
[95] HUBER, P. J. (1961). Homotopy theory in general categories. Math. Ann. 144 361-385. · Zbl 0099.17905 · doi:10.1007/BF01396534
[96] LE CAM, L. (1964). Sufficiency and approximate sufficiency. Ann. Math. Statist. 35 1419-1455. · Zbl 0129.11202 · doi:10.1214/aoms/1177700372
[97] ARBUTHNOTT, J. (1712). An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes. Philos. Trans. Roy. Soc. London 27 186-190.
[98] BAILEY, R. A. (1981). A unified approach to design of experiments. J. Roy. Statist. Soc. Ser. A 144 214-223. JSTOR: · Zbl 0469.62053 · doi:10.2307/2981920
[99] BAILEY, R. A. (1991). Strata for randomized experiments (with discussion). J. Roy. Statist. Soc. Ser. B 53 27-78. JSTOR: · Zbl 0800.62477
[100] COX, D. R. (1990). Roles of models in statistical analysis. Statist. Sci. 5 169-174. · Zbl 0955.62518 · doi:10.1214/ss/1177012165
[101] DIACONIS, P. (1988). Group Representations in Probability and Statistics. IMS, Hay ward, CA. · Zbl 0695.60012
[102] DIACONIS, P., GRAHAM, R. L. and KANTOR, W. M. (1983). The mathematics of perfect shuffles. Adv. in Appl. Math. 4 175-196. · Zbl 0521.05005 · doi:10.1016/0196-8858(83)90009-X
[103] FURSTENBURG, H. (1963). Noncommuting random products. Trans. Amer. Math. Soc. 108 377-428. JSTOR: · Zbl 0203.19102 · doi:10.2307/1993589
[104] GRENANDER, U. (1963). Probabilities on Algebraic Structures. Wiley, New York. · Zbl 0131.34804
[105] MCCULLAGH, P. (1999). Quotient spaces and statistical models. Canad. J. Statist. 27 447-456. JSTOR: · Zbl 0946.62005 · doi:10.2307/3316103
[106] MCCULLAGH, P. (2000). Invariance and factorial models (with discussion). J. Roy. Statist. Soc. Ser. B 62 209-256. JSTOR: · doi:10.1111/1467-9868.00229
[107] PINCUS, S. and KALMAN, R. E. (1997). Not all (possibly) ”random” sequences are created equal. Proc. Nat. Acad. Sci. U.S.A. 94 3513-3518. JSTOR: · Zbl 0873.11047 · doi:10.1073/pnas.94.8.3513
[108] PINCUS, S. and SINGER, B. H. (1996). Randomness and degrees of irregularity. Proc. Nat. Acad. Sci. U.S.A. 93 2083-2088. JSTOR: · Zbl 0849.60002 · doi:10.1073/pnas.93.5.2083
[109] GUILFORD, CONNECTICUT 06437 E-MAIL: stevepincus@alum.mit.edu in McCullagh (1980). Suppose we are dealing with a universe where the natural models for handling of binary responses are the logistic regression models. This could be some socioeconomic research area where peoples’ attitudes to various features of brands or service levels are recorded on a binary scale, and the interest lies in the dependence of these attitudes on all sorts of background variables. How do we extend this universe to deal with ordered categorical responses, for example, on three-point positive/indifferent/negative scales? A natural requirement seems to be that if data are dichotomized by the (arbitrary) selection of a cutpoint (putting, for example, negative and indifferent together in a single category), then the marginal model coming out of this is a logistic regression model. This is, after all, just a way of recording a binary response, and even though it would hurt any statistician to throw away information in this way, it is done all the time on more invisible levels. Another natural requirement is that the parameters of interest-with the constant term as an obvious exception-should not depend on how the cutpoint is selected. It is easy to show that these two requirements are met by one and only one class of models for ordered responses, namely the models that can and Nelder (1989). Thus, we have here the absurd situation that the potentially canonical-but unfortunately nonexisting-answer to a simple and canonical question results in a collection of very useful methods. The overdispersion models exist as perfectly respectable operational objects, but not as mathematical objects. My personal opinion [Tjur (1998)] is that the simplest way of giving these models a concrete interpretation goes via approximation by nonlinear models for normal data and a small adjustment of the usual estimation method for these models. But neither this, nor the concept of quasi-likelihood, answers the fundamental question whether there is a way of modifying the conditions (1) and (2) above in such a way that a meaningful theory of generalized linear models with overdispersion comes out as the unique answer. It is tempting to ask, in the present context, whether it is a necessity at all that these models ”exist” in the usual sense. Is it so, perhaps, that after a century or two people will find this question irrelevant, just as we find old discussions about existence of the number + irrelevant? If this is the case, a new attitude to statistical models is certainly required.
[110] MCCULLAGH, P. (1980). Regression models for ordinal data (with discussion). J. Roy. Statist. Soc. Ser. B 42 109-142. JSTOR: · Zbl 0483.62056
[111] MCCULLAGH, P. and NELDER, J. A. (1989). Generalized Linear Models, 2nd. ed. Chapman and Hall, London. · Zbl 0744.62098
[112] NELDER, J. A. and WEDDERBURN, R. W. M. (1972). Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370-384.
[113] TJUR, T. (1998). Nonlinear regression, quasi likelihood, and overdispersion in generalized linear models. Amer. Statist. 52 222-227. JSTOR: · doi:10.2307/2685928
[114] has recently been obtained by Wichura (2001). Fraser and Reid ask whether category theory can do more than provide a framework. My experience here is similar to Huber’s, namely that category theory is well suited for this purpose but, as a branch of logic, that is all we can expect from it. Regarding the coefficient of variation, I agree that there are applications in which this is a useful and natural parameter or statistic, just as there are (a few) applications in which the correlation coefficient is useful. The groups used in this paper are such that the origin is either fixed or completely arbitrary. In either case there is no room for hedging. In practice, things are rarely so clear cut. In order to justify the coefficient of variation, it seems to me that the applications must be such that the scale of measurement has a reasonably well-defined origin relevant to the problem. The Cauchy model with the real fractional linear group was originally used as an example to highlight certain inferential problems. I do not believe I have encountered an application in which it would be easy to make a convincing case for the relevance of this group. Nevertheless, I think it is helpful to study such examples for the light they may shed on foundational matters. The fact that the median is not a natural subparameter is an insight that casts serious doubt on the relevance of the group in ”conventional” applications. To turn the argument around, the fact that the Cauchy model is closed under real fractional linear transformation is not, in itself, an adequate reason to choose that group as the base category. In that sense, I agree with a primary thesis of Fraser’s Structure of Inference that the group supersedes the probability model. Tjur’s remarks capture the spirit of what I am attempting to do. In the cumulative logit model, it is clear intuitively what is meant by the statement that the parameter of interest should not depend on how the cutpoints are selected. As is often the case, what is intuitively clear is not so easy to express in mathematical terms. It does not mean that the maximum-likelihood estimate is unaffected by this choice. For that reason, although Tjur’s second condition on overdispersion models has a certain appeal, I do not think it carries the same force as the first. His description of natural subparameters in regression is a model of clarity.
[115] given the values on the contour (Matheron, 1971). Both processes are also conformal, but the similarity ends there. The set of conformal processes is also closed under addition of independent processes. Thus, the sum of white noise and W is conformal but not Markov. Bey ond convolutions of white noise and W, it appears most unlikely that there exists another conformal process with Gaussian increments. Whittle’s (1954) family of stationary Gaussian processes has the Markov property [Chilès and Delfiner (1999)] but the family is not closed under conformal maps nor under convolution.
[116] CHILÈS, J.-P. and DELFINER, P. (1999). Geostatistics. Wiley, New York. · Zbl 0922.62098
[117] FEy NMAN, R. P., LEIGHTON, R. B. and SANDS, M. (1964). The Fey nman Lectures on physics. Addison-Wesley, Reading, MA.
[118] FRASER, D. A. S. (1968b). The Structure of Inference. Wiley, New York. · Zbl 0164.48703
[119] HELLAND, I. S. (1999a). Quantum mechanics from sy mmetry and statistical modelling. Internat. J. Theoret. Phy s. 38 1851-1881. · Zbl 0953.81003 · doi:10.1023/A:1026676913271
[120] KINGMAN, J. F. C. (1972). On random sequences with spherical sy mmetry. Biometrika 59 492-494. JSTOR: · Zbl 0238.60025 · doi:10.1093/biomet/59.2.492
[121] MACCULLAGH, J. (1839). An essay towards the dy namical theory of cry stalline reflexion and refraction. Trans. Roy. Irish Academy 21 17-50.
[122] MATHERON, G. (1971). The theory of regionalized variables and its applications. Cahiers du Centre de Morphologie Mathématique de Fontainbleu 5.
[123] WHITTLE, P. (1954). On stationary processes in the plane. Biometrika 41 434-449. JSTOR: · Zbl 0058.35601 · doi:10.1093/biomet/41.3-4.434
[124] WICHURA, M. (2001). Some de Finetti ty pe theorems.
[125] CHICAGO, ILLINOIS 60637-1514 E-MAIL: pmcc@galton.uchicago.edu · Zbl 1194.00044
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.