×

Flexible Tweedie regression models for continuous data. (English) Zbl 07192054

Summary: Tweedie regression models (TRMs) provide a flexible family of distributions to deal with non-negative right-skewed data and can handle continuous data with probability mass at zero. Estimation and inference of TRMs based on the maximum likelihood (ML) method are challenged by the presence of an infinity sum in the probability function and non-trivial restrictions on the power parameter space. In this paper, we propose two approaches for fitting TRMs, namely quasi-likelihood (QML) and pseudo-likelihood (PML). We discuss their asymptotic properties and perform simulation studies to compare our methods with the ML method. We show that the QML method provides asymptotically efficient estimation for regression parameters. Simulation studies showed that the QML and PML approaches present estimates, standard errors and coverage rates similar to the ML method. Furthermore, the second-moment assumptions required by the QML and PML methods enable us to extend the TRMs to the class of quasi-TRMs in Wedderburn’s style. It allows to eliminate the non-trivial restriction on the power parameter space, and thus provides a flexible regression model to deal with continuous data. We provide an R implementation and illustrate the application of TRMs using three data sets.

MSC:

62-XX Statistics
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Nelder JA, Wedderburn RWM. Generalized linear models. J R Stat Soc Ser A. 1972;135:370-384. doi: 10.2307/2344614[Crossref], [Web of Science ®], [Google Scholar]
[2] Jørgensen B. Exponential dispersion models. J R Stat Soc Ser B (Methodol). 1987;49:127-162. [Google Scholar] · Zbl 0662.62078
[3] Jørgensen B. The theory of dispersion models. London: Chapman and Hall; 1997. [Google Scholar] · Zbl 0928.62052
[4] Tweedie MCK. An index which distinguishes between some important exponential families. In: Ghosh JK, Roy J, editors. Statistics: applications and new directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. Calcutta: Indian Statistical Institute; 1984. [Google Scholar]
[5] Jørgensen B, Paes De Souza MC. Fitting Tweedie’s compound Poisson model to insurance claims data. Scand Actuar J. 1994;1994 :69-93. doi: 10.1080/03461238.1994.10413930[Taylor & Francis Online], [Google Scholar] · Zbl 0802.62089
[6] Smyth GK, Jørgensen B. Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bull. 2002;32:143-157. doi: 10.2143/AST.32.1.1020[Crossref], [Google Scholar] · Zbl 1094.91514
[7] Dunn PK, Smyth GK. Series evaluation of Tweedie exponential dispersion model densities. Stat Comput. 2005;15:267-280. doi: 10.1007/s11222-005-4070-y[Crossref], [Web of Science ®], [Google Scholar]
[8] Dunn PK, Smyth GK. Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Stat Comput. 2008;18:73-86. doi: 10.1007/s11222-007-9039-6[Crossref], [Web of Science ®], [Google Scholar]
[9] Dunn PK. tweedie: Tweedie exponential family models. R package version 2.1.7. Vienna: R Core Team; 2013. [Google Scholar]
[10] Bonat WH, Jørgensen B. Multivariate covariance generalized linear models. J R Stat Soc: Ser C (Appl Stat). 2016;65:649-675. doi: 10.1111/rssc.12145[Crossref], [Web of Science ®], [Google Scholar]
[11] Jørgensen B, Knudsen SJ. Parameter orthogonality and bias adjustment for estimating functions. Scand J Stat. 2004;31:93-114. doi: 10.1111/j.1467-9469.2004.00375.x[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1051.62022
[12] Gourieroux C, Monfort A, Trognon A. Pseudo maximum likelihood methods: theory. Econometrica. 1984;52:681-700. doi: 10.2307/1913471[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0575.62031
[13] Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika. 1974;61:439-447. [Web of Science ®], [Google Scholar] · Zbl 0292.62050
[14] Vinogradov V. On the power-variance family of probability distributions. Commun Stat - Theory Methods. 2004;33:1007-1029. doi: 10.1081/STA-120029821[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1114.60308
[15] Barndorff-Nielsen OE, Shephard N. Normal modified stable processes. Theory Probab Math Stat. 2001;65:1-19. [Google Scholar] · Zbl 1026.60058
[16] Lee MT, Whitmore GA. Stochastic processes directed by randomized time. J Appl Probab. 1993;30:302-314. doi: 10.1017/S0021900200117322[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0777.60030
[17] Kendall WS. A scale invariant clustering of genes on human chromosome 7. BMC Evol Biol. 2004;4:1-10. doi: 10.1186/1471-2148-4-1[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[18] Kendall WS. Scale invariant correlations between genes and SNPs on human chromosome 1 reveal potential evolutionary mechanisms. J Theor Biol. 2007;245:329-340. doi: 10.1016/j.jtbi.2006.10.010[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1451.92215
[19] Foster SD, Bravington MV. A Poisson-Gamma model for analysis of ecological non-negative continuous data. Environ Ecol Stat. 2013;20:533-552. doi: 10.1007/s10651-012-0233-0[Crossref], [Web of Science ®], [Google Scholar]
[20] Shono H. Application of the Tweedie distribution to zero-catch data in CPUE analysis. Fish Res. 2008;93:154-162. doi: 10.1016/j.fishres.2008.03.006[Crossref], [Web of Science ®], [Google Scholar]
[21] Kendall WS, Lagerwaard FJ, Agboola O. Characterization of the frequency distribution for human hematogenous metastases: evidence for clustering and a power variance function. Clin Exp Metastasis. 2000;18:219-229. doi: 10.1023/A:1006737100797[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[22] Chen X-D, Tang N-S. Bayesian analysis of semiparametric reproductive dispersion mixed-effects models. Comput Stat Data Anal. 2010;54:2145-2158. doi: 10.1016/j.csda.2010.03.022[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1284.62168
[23] Zhang Y. Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models. Stat Comput. 2013;23:743-757. doi: 10.1007/s11222-012-9343-7[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1322.62198
[24] R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2016. [Google Scholar]
[25] Fornberg B, Sloan DM. A review of pseudospectral methods for solving partial differential equations. Acta Numerica. 1994;3:203-267. doi: 10.1017/S0962492900002440[Crossref], [Google Scholar] · Zbl 0808.65117
[26] Gilbert P, Varadhan R. numDeriv: accurate numerical derivatives. R package version 2014.2-1. Vienna: R Core Team; 2015. [Google Scholar]
[27] Nelder JA, Mead R. A simplex method for function minimization. Comp J. 1965;7:308-313. doi: 10.1093/comjnl/7.4.308[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0229.65053
[28] Holst R, Jørgensen B. Generalized linear longitudinal mixed models with linear covariance structure and multiplicative random effects. Chilean J Stat. 2015;6:15-36. [Web of Science ®], [Google Scholar] · Zbl 1449.62168
[29] Liang KY, Zeger SL. Inference based on estimating functions in the presence of nuisance parameters. Stat Sci. 1995;10:158-173. doi: 10.1214/ss/1177010028[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0955.62558
[30] Jørgensen B, Demétrio CGB, Kristensen E, et al. Bias-corrected Pearson estimating functions for Taylor’s power law applied to benthic macrofauna data. Stat Probab Lett. 2011;81:749-758. doi: 10.1016/j.spl.2011.01.005[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1217.62204
[31] Park H, Cho K. Use of covariates in Taylor’s power law for sequential sampling in pest management. J Agric Biol Environ Stat. 2004;9:462-478. doi: 10.1198/108571104X15746[Crossref], [Web of Science ®], [Google Scholar]
[32] de Boor C. On calculating with B-splines. J Approx Theory. 1972;6:50-62. doi: 10.1016/0021-9045(72)90080-9[Crossref], [Google Scholar] · Zbl 0239.41006
[33] Wood SN. Generalized additive models: an introduction with R. Chapman Hall/CRC: Texts in Statistical Science; 2006. [Crossref], [Google Scholar] · Zbl 1087.62082
[34] Kusnierczyk W. rbenchmark: Benchmarking routine for R. R package version 1.0.0. Vienna: R Core Team; 2012. [Google Scholar]
[35] Wu KYK, Li WK. Double generalized threshold models with constraint on the dispersion by the mean. Comput Stat Data Anal. 2015;82:59-73. doi: 10.1016/j.csda.2014.08.003[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1507.62186
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.