Gamma shape mixtures for heavy-tailed distributions. (English) Zbl 1400.62292

Summary: An important question in health services research is the estimation of the proportion of medical expenditures that exceed a given threshold. Typically, medical expenditures present highly skewed, heavy tailed distributions, for which (a) simple variable transformations are insufficient to achieve a tractable low-dimensional parametric form and (b) nonparametric methods are not efficient in estimating exceedance probabilities for large thresholds. Motivated by this context, in this paper we propose a general Bayesian approach for the estimation of tail probabilities of heavy-tailed distributions, based on a mixture of gamma distributions in which the mixing occurs over the shape parameter. This family provides a flexible and novel approach for modeling heavy-tailed distributions, it is computationally efficient, and it only requires to specify a prior distribution for a single parameter. By carrying out simulation studies, we compare our approach with commonly used methods, such as the log-normal model and nonparametric alternatives. We found that the mixture-gamma model significantly improves predictive performance in estimating tail probabilities, compared to these alternatives. We also applied our method to the Medical Current Beneficiary Survey (MCBS), for which we estimate the probability of exceeding a given hospitalization cost for smoking attributable diseases. We have implemented the method in the open source GSM package, available from the Comprehensive R Archive Network.


62P10 Applications of statistics to biology and medical sciences; meta analysis


SDaA; mclust; robustbase; CRAN; GSM
Full Text: DOI arXiv


[1] Abramowitz, M. and Stegun, I. A. E. (1972)., Handbook of Mathematical Functions with Formulas , Graphs , and Mathematical Tables , 9th printing. Dover, New York. · Zbl 0543.33001
[2] Aitchison, J. and Shen, S. M. (1980). Logistic normal distributions: Some properties and uses., Biometrika 67 261-272. JSTOR: · Zbl 0433.62012
[3] Aitkin, M. and Rubin, D. B. (1985). Estimation and hypothesis testing in finite mixture models., J. Roy. Statist. Soc. Ser. B 47 67-75. · Zbl 0576.62038
[4] Barber, J. and Thompson, S. (2004). Multiple regression of cost data: Use of generalised linear models., J. Health Services Research and Policy 9 197-204.
[5] Briggs, A. and Gray, A. (2006). The distribution of health care costs and their statistical analysis for economic evaluation., J. Health Economics 25 198-213.
[6] Briggs, A., Nixon, R., Dixon, S. and Thompson, S. (2005). Parametric modelling of cost data: Some simulation evidence., Health Economics 14 421-428.
[7] Buntin, M. B. and Zaslavsky, A. M. (2004). Too much ado about two-part models and transformation? Comparing methods of modeling Medicare expenditures., J. Health Economics 23 525-542.
[8] Cantoni, E. and Ronchetti, E. (1998). A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures., J. Health Services Research and Policy 3 233-245.
[9] Conigliani, C. and Tancredi, A. (2005). Semi-parametric modelling for costs of health care technologies., Statistics in Medicine 24 3171-3184.
[10] Conwell, L. J. and Cohen, J. W. (2005). Characteristics of persons with high medical expenditures in the U.S. civilian noninstitutionalized population, 2002. Technical report. Available at www.meps.ahrq.gov/papers/st73/stat73.pdf. Agency for Healthcare Research and, Quality.
[11] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm., J. Roy. Statist. Soc. Ser. B 39 1-38. JSTOR: · Zbl 0364.62022
[12] Diebolt, J. and Robert, C. P. (1990). Bayesian estimation of finite mixture distributions. I. Theoretical aspects. Technical report 110, Univ. Paris VI, Paris. · Zbl 0711.62026
[13] Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling., J. Roy. Statist. Soc. Ser. B 56 363-375. JSTOR: · Zbl 0796.62028
[14] Dodd, S., Bassi, A., Bodger, K. and Williamson, P. (2004). A comparison of multivariable regression models to analyse cost data., J. Evaluation in Clinical Practice 9 197-204.
[15] Dominici, F., Cope, L., Naiman, D. Q. and Zeger, S. L. (2005). Smooth quantile ratio estimation (SQUARE)., Biometrika 92 543-557. · Zbl 1183.62056
[16] Dominici, F. and Zeger, S. L. (2005). Smooth quantile ratio estimation with regression: Estimating medical expenditures for smoking-attributable diseases., Biostatistics 6 505-519. · Zbl 1169.62405
[17] Duan, N. (1983). Smearing estimate: A nonparametric retransformation method., J. Amer. Statist. Assoc. 78 605-610. JSTOR: · Zbl 0534.62021
[18] Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis and density estimation., J. Amer. Statist. Assoc. 97 611-631. JSTOR: · Zbl 1073.62545
[19] Fraley, C. and Raftery, A. E. (2006). MCLUST Version 3 for R: Normal mixture modeling and model-based clustering. Technical Report 504, Dept. Statistics, Univ., Washington.
[20] Gelman, A. (2007). Struggles with survey weighting and regression modeling., Statist. Sci. 22 153-164. · Zbl 1246.62043
[21] Jasra, A., Holmes, C. C. and Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling., Statist. Sci. 20 50-67. · Zbl 1100.62032
[22] Johnson, E., Dominici, F., Griswold, M. and Zeger, S. L. (2003). Disease cases and their medical costs attribuitable to smoking: An analysis of the national medical expenditure survey., J. Econometrics 112 135-151. · Zbl 1038.62101
[23] Kilian, R., Matschinger, H., Loeffler, W., Roick, C. and Angermeyer, M. C. (2002). A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment., J. Mental Health Policy and Economics 5 21-31.
[24] Lehmann, E. L. and Casella, G. (1998)., Theory of Point Estimation , 2nd ed. Springer, New York. · Zbl 0916.62017
[25] Lindsay, B. G. (1995)., Mixture Models : Theory , Geometry and Applications . IMS, Hayward, CA. · Zbl 1163.62326
[26] Lipscomb, J., Ancukiewicz, M., Parmigiani, G., Hasselblad, V., Samsa, G. and Matchar, D. B. (1998). Predicting the cost of illness: A comparison of alternative models applied to stroke., Medical Decision Making 18 S39-S56.
[27] Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem., J. Amer. Statist. Assoc. 89 958-966. JSTOR: · Zbl 0804.62033
[28] Lohr, S. L. (1999)., Sampling : Design and Analysis . Duxbury Press, Pacific Grove, CA. · Zbl 0967.62005
[29] MacEachern, S. N. (1994). Estimating normal means with a conjugate style Dirichlet process prior., Comm. Statist.-Simul. Comput. 23 727-741. · Zbl 0825.62053
[30] MacEachern, S. N., Clyde, M. and Liu, J. S. (1999). Sequential importance sampling for nonparametric Bayes models: The next generation., Canad. J. Statist. 27 251-267. JSTOR: · Zbl 0957.62068
[31] Manning, W. G. (1998). The logged dependent variable, heteroscedasticity, and the retransformation problem., J. Health Economics 17 283-295.
[32] Manning, W. G. and Mullahy, J. (2001). Estimating log models: To transform or not to transform?, J. Health Economics 20 461-494.
[33] Marin, J. M., Mengersen, K. and Robert, C. P. (2005). Bayesian modelling and inference on mixtures of distributions. In, Handbook of Statistics 25 (D. Dey and C. R. Rao, eds.) 459-507. North-Holland, Amsterdam.
[34] McLachlan, G. and Peel, D. (2000)., Finite Mixture Models . Wiley, New York. · Zbl 0963.62061
[35] Mullahy, J. (1998). Much ado about two: Reconsidering retransformation and the two-part model in health econometrics., J. Health Economics 17 247-281.
[36] Mullahy, J. and Manning, W. G. (2005). Generalized modeling approaches to risk adjustment of skewed outcomes data., J. Health Economics 24 465-488.
[37] Pfeffermann, D. (1993). The role of sampling weights when modeling survey data., Internat. Statist. Rev. 61 310-337. · Zbl 0779.62009
[38] Powers, C. A., Meyer, C. M., Roebuck, M. C. and Vaziri, B. (2005). Predictive modeling of total healthcare costs using pharmacy claims data: A comparison of alternative econometric cost modeling techniques., Medical Care 43 1065-1072.
[39] Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion)., J. Roy. Statist. Soc. Ser. B 59 731-792. JSTOR: · Zbl 0891.62020
[40] Robert, C. P. (1996). Mixtures of distributions: Inference and estimation. In, Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 441-464. Chapman and Hall/CRC, New York. · Zbl 0849.62013
[41] Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals., J. Amer. Statist. Assoc. 92 894-902. JSTOR: · Zbl 0889.62021
[42] Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation., J. Amer. Statist. Assoc. 82 528-540. JSTOR: · Zbl 0619.62029
[43] Titterington, D. M. (1997). Mixture distributions. In, Encyclopedia of Statistical Sciences 399-407. Wiley, New York.
[44] Titterington, D. M., Smith, A. F. M. and Makov, U. E. (1985)., Statistical Analysis of Finite Mixture Distributions . Wiley, Chichester. · Zbl 0646.62013
[45] Tu, W. and Zhou, X.-H. (1999). A Wald test comparing medical cost based on log-normal distributions with zero valued costs., Statistics in Medicine 18 2749-2761.
[46] Venturini, S., Dominici, F. and Parmigiani, G. (2008). Supplemet to “Gamma shape mixtures for heavy-tailed distributions.” DOI:, 10.1214/08-AOAS156SUPP. · Zbl 1400.62292
[47] Willan, A. R. and Briggs, A. H. (2006)., Statistical Analysis of Cost-Effectiveness Data . Wiley, New York. · Zbl 1129.62109
[48] Zellner, A. (1971). Bayesian and non-Bayesian analysis of the log-normal distribution and log-normal regression., J. Amer. Statist. Assoc. 66 327-330. JSTOR: · Zbl 0226.62064
[49] Zhou, X.-H., Gao, S. and Hui, S. L. (1997). Methods for comparing the means of two independent log-normal samples., Biometrics 53 1129-1135. · Zbl 0890.62039
[50] Zhou, X.-H., Li, C., Gao, S. and Tierney, W. M. (2001). Methods for testing equality of means of health care costs in a paired design study., Statistics in Medicine 20 1703-1720.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.