×

On the measure and the estimation of evenness and diversity. (English) Zbl 1284.62047

Summary: Modelling word or species frequency count data through zero truncated Poisson mixture models allows one to interpret the model mixing distribution as the distribution of the word or species frequencies of the vocabulary or population. As a consequence, estimates of their mixing density can be used as a fingerprint of the style of the author in his texts or of the ecosystem in its samples. Definitions of measure of the evenness and of measure of the diversity within a vocabulary or population are given, and the novelty of these definitions is explained. It is then proposed that the measures of the evenness and of the diversity of a vocabulary or population be approximated through the expectation of these measures under the word or species frequency distribution. That leads to the assessment of the lack of diversity through measures of the variability of the mixing frequency distribution estimates described above.

MSC:

62-07 Data analysis (statistics) (MSC2010)
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D40 Ecology

Software:

EstimateS
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Baayen, H., Word Frequency Distributions (2001), Kluwer: Kluwer Dordretch · Zbl 0989.68146
[2] Basharin, G. P., On a statistical estimate for the entropy of a sequence of independent random variables, Theory of Probability and its Applications, 4, 333-336 (1959)
[3] Blyth, C. B., Note on estimating information, The Annals of Mathematical Statistics, 30, 71-79 (1959) · Zbl 0225.62050
[4] Bunge, J.; Fitzpatrick, M., Estimating the number of species: a review, Journal of the American Statistical Association, 88, 364-373 (1993)
[5] Chao, A., Nonparametric estimation of the number of classes in a population, Scandinavian Journal of Statistics, 11, 265-270 (1984)
[6] Chao, A.; Shen, T. J., Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample, Environmental and Ecological Statistics, 10, 429-443 (2003)
[8] DeGroot, M. H., Uncertainty, information and sequential experiments, The Annals of Mathematical Statistics, 33, 404-419 (1962) · Zbl 0151.22803
[9] Dunn, P. K.; Smyth, G. K., Evaluation of Tweedie exponential dispersion model densities by Fourier inversion, Statistics and Computing, 18, 73-86 (2008)
[10] Gerber, H. U., From the generalized gamma to the generalized negative binomial distribution, Insurance: Mathematics and Economics, 10, 303-309 (1991) · Zbl 0743.62014
[11] Ginebra, J., On the measure of the information in a statistical experiment, Bayesian Analysis, 2, 167-212 (2007) · Zbl 1331.62056
[12] Giron, J.; Ginebra, J.; Riba, A., Bayesian analysis of a multinomial sequence and homogeneity of literary style, The American Statistician, 32, 61-74 (2005) · Zbl 1121.62476
[13] Good, I. J., The population frequencies of species and the estimation of population parameters, Biometrika, 40, 237-264 (1953) · Zbl 0051.37103
[14] Good, I. J., Comment to “diversity as a concept and its measurement”, Journal of the American Statistical Association, 77, 561-563 (1982)
[15] Hill, M. O., Diversity and evenness: a unifying notation and its consequences, Ecology, 54, 427-432 (1973)
[16] Holmes, D. I., The analysis of literary style: a review, Journal of the Royal Statistical Society. Series A, 148, 328-341 (1985)
[17] Holmes, D. I., A stylometric analysis of mormon scripture and related texts, Journal of the Royal Statistical Society. Series A, 155, 91-120 (1992)
[18] Holmes, D. I.; Forsyth, R. S., The Federalist revisited. New directions in authorship attribution, Literary and Linguistic Computing, 10, 111-127 (1995)
[19] Hougaard, P.; Lee, M.-L. T.; Whitmore, G. A., Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes, Biometrics, 53, 1225-1238 (1997) · Zbl 0911.62101
[20] Hurlbert, S. H., The nonconcept of species diversity: a critique and alternative parameters, Ecology, 52, 577-586 (1971)
[21] Izsak, J.; Papp, L., A link between ecological diversity indices and measures of biodiversity, Ecological Modelling, 130, 151-156 (2000)
[22] Jorgensen, B., Statistical Properties of the Generalized Inverse Gaussian Distribution (1982), Wiley: Wiley New York · Zbl 0486.62022
[23] Jorgensen, B., The Theory of Dispersion Models (1997), Chapman Hall: Chapman Hall London · Zbl 0928.62052
[24] Kempton, R. A., The structure of species abundance and measurement of diversity, Biometrics, 35, 307-321 (1979)
[25] Kempton, R. A.; Wedderburn, R. W.M., A comparison of three measures of species diversity, Biometrics, 34, 25-37 (1978)
[26] Lyons, N. I.; Hutcheson, K., Estimation of Simpson’s diversity when counts follow a Poisson distribution, Biometrics, 42, 171-176 (1986)
[27] Magurran, A. E., Measuring Biological Diversity (2004), Blackwell: Blackwell New York
[28] Marshall, A. W.; Olkin, I., Inequalities: Theory of Majorization and its Applications (1979), Academic Press: Academic Press New York · Zbl 0437.26007
[29] Muller, A.; Stoyan, D., Comparison Methods for Stochastic Models and Risk (2002), Wiley: Wiley New York
[30] Patil, G. P.; Taillie, C., Diversity as a concept and its measurement (with discussion), Journal of the American Statistical Association, 77, 548-567 (1982) · Zbl 0511.62113
[31] Pielou, E. C., Ecological Diversity (1975), Wiley: Wiley New York
[32] Pollatschek, M.; Radday, Y. T., Vocabulary richness and concentration in Hebrew biblical literature, Association for Literary and Linguistic Computing Bulletin, 8, 217-231 (1981)
[33] Puig, X.; Ginebra, J.; Perez-Casany, M., Extended truncated inverse Gaussian-Poisson model, Statistical Modelling, 9, 131-151 (2009) · Zbl 07257699
[34] Rao, C. R., Diversity and dissimilarity coefficients: a unified approach, Theoretical Population Biology, 21, 24-43 (1982) · Zbl 0516.92021
[35] Rao, C. R., Rao’s axiomatization of diversity measures, (Johnson, N. L.; Kotz, S.; Read, C. B., Encyclopedia of Statistics, vol. 7 (1984), Wiley: Wiley New York), 614-617
[36] Rényi, A., On measures of entropy and information, (Neyman, J., Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability (1961), University of California Press: University of California Press Berkeley), 547-561
[37] Riba, A.; Ginebra, J., Diversity of vocabulary and homogeneity of literary style, Journal of Applied Statistics, 33, 729-741 (2006) · Zbl 1118.62393
[38] Ricotta, C., Through the jungle of biological diversity, Acta Biotheoretica, 53, 29-38 (2005)
[39] Ricotta, C.; Szeidl, L., Towards a unifying approach to diversity mesures: bridging the gap between Shannon entropy and Rao’s quadratic index, Theoretical Population Biology, 70, 237-243 (2006) · Zbl 1112.92067
[40] Routledge, R. D., Diversity indices: which ones are admissible?, Journal of Theoretical Biology, 76, 503-515 (1979)
[41] Sen, A., Poverty, inequality and unemployment: some conceptual issues in measurement, Sankhya C, 36, 67-82 (1974)
[42] Shaked, M.; Shanthikumar, J. G., Stochastic Orders and their Applications (1994), Academic Press: Academic Press Boston · Zbl 0806.62009
[43] Sichel, H. S., On a distribution law for words frequencies, Journal of the American Statistical Association, 70, 542-547 (1975)
[44] Sichel, H. S., Asymptotic efficiencies of three methods of estimation for the inverse Gaussian-Poisson distribution, Biometrika, 69, 467-472 (1982)
[45] Sichel, H. S., Word frequency distributions and type-token characteristics, Mathematical Scientist, 11, 45-72 (1986) · Zbl 0629.62110
[46] Sichel, H. S., Parameter estimation for a word frequency distribution based on occupancy theory, Communications in Statistics. Theory and Methods, 15, 935-949 (1986) · Zbl 0608.62149
[47] Sichel, H. S., Modelling species-abundance frequencies and species-individual functions with the generalized inverse Gaussian-Poisson distribution, South African Statistical Journal, 31, 13-37 (1997) · Zbl 0888.62115
[48] Smith, W.; Grassle, J. F., Sampling properties of a family of diversity measures, Biometrics, 33, 283-292 (1977) · Zbl 0357.92026
[49] Solomon, D. L., A comparative approach to species diversity, (Grassle, J. F.; Patil, G. P.; Smith, W.; Taillie, C., Ecological Diversity in Theory and Practice (1979), International Cooperative Publishing House: International Cooperative Publishing House Maryland), 29-35
[50] Solow, A.; Polasky, S.; Broadus, J., On the measurement of biological diversity, Journal of Environmental Economics and Management, 24, 60-68 (1993)
[51] Tong, Y. L., Some distribution properties of the sample species-diversity indices and their applications, Biometrics, 39, 999-1008 (1983) · Zbl 0536.92028
[53] Yule, G. U., The Statistical Study of Literary Vocabulary (1944), Cambridge University Press: Cambridge University Press London
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.