×

Optimal estimators of principal points for minimizing expected mean squared distance. (English) Zbl 1326.62123

Summary: \(k\)-principal points of a random variable are \(k\) points that minimize the mean squared distance (MSD) between the random variable and the nearest of the \(k\) points. This paper focuses on finding optimal estimators of principal points in terms of the expected mean squared distance (EMSD) between the random variable and the nearest principal point estimator. These estimators are compared with nonparametric and maximum likelihood estimators. It turns out that a minimum EMSD estimator of \(k\)-principal points of univariate normal distributions is determined by the \(k\)-principal points of the \(t\)-distribution with \(n+1\) degrees of freedom, where \(n\) is the sample size. Extensions of the results to location-scale families, multivariate distributions, and principal surfaces are also discussed.

MSC:

62H12 Estimation in multivariate analysis
62E17 Approximations to statistical distributions (nonasymptotic)
62H05 Characterization and structure theory for multivariate probability distributions; copulas
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H99 Multivariate analysis

Software:

Flury; AS 136; R
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Antle, C. E.; Bain, L. J., A property of maximum likelihood estimators of location and scale parameters, SIAM Rev., 11, 251-253 (1969) · Zbl 0176.48502
[2] Bali, J. L.; Boente, G., Principal points and elliptical distributions from the multivariate setting to the functional case, Statist. Probab. Lett., 79, 1858-1865 (2009) · Zbl 1169.62326
[3] Bartoletti, S.; Flury, B.; Nel, D. G., Allometric extension, Biometrika, 55, 1210-1214 (1999) · Zbl 1059.62618
[4] Cox, D. R., A note on grouping, J. Amer. Statist. Assoc., 52, 543-547 (1957) · Zbl 0088.35402
[5] Dalenius, T., The problem of optimum stratification, Skandinavisk Aktuarietidskrift, 33, 203-213 (1950) · Zbl 0041.46302
[6] Flury, B., Principal points, Biometrika, 77, 33-41 (1990) · Zbl 0691.62053
[7] Flury, B., Estimation of principal points, Appl. Stat., 42, 139-151 (1993) · Zbl 0825.62524
[8] Flury, B., A First Course in Multivariate Statistics (1997), Springer: Springer New York · Zbl 0879.62052
[9] Gersho, A.; Gray, R., Vector Quantization and Signal Compression (1992), Kluwer Academic Publishers: Kluwer Academic Publishers Boston · Zbl 0782.94001
[10] Graf, L.; Luschgy, H., Foundations of Quantization for Probability Distributions (2000), Springer: Springer Berlin · Zbl 0951.60003
[11] Gray, R. M.; Neuhoff, D. L., Quantization, IEEE Trans. Inform. Theory, 44, 2325-2383 (1998) · Zbl 1016.94016
[12] Gu, X. N.; Mathew, T., Some characterizations of symmetric two-principal points, J. Statist. Plann. Inference, 98, 29-37 (2001) · Zbl 0977.62058
[13] Hartigan, J. A., Clustering Algorithms (1975), Wiley: Wiley New York · Zbl 0321.62069
[14] Hartigan, J. A.; Wong, M. A., A \(K\)-means clustering algorithm, Appl. Stat., 28, 100-108 (1979) · Zbl 0447.62062
[15] Hastie, T.; Stuetzle, W., Principal curves, J. Amer. Statist. Assoc., 84, 502-516 (1989) · Zbl 0679.62048
[16] Klingenberg, C. P.; Froese, R., A multivariate comparison of allometric growth patterns, Syst. Biol., 40, 410-419 (1991)
[17] Kurata, H., On principal points for location mixtures of spherically symmetric distributions, J. Statist. Plann. Inference, 138, 3405-3418 (2008) · Zbl 1145.62044
[18] Kurata, H.; Hoshino, T.; Fujikoshi, Y., Allometric extension model for conditional distributions, J. Multivariate Anal., 99, 1985-1998 (2008) · Zbl 1169.62331
[19] Kurata, H.; Qiu, D., Linear subspace spanned by principal points of a mixture of spherically symmetric distributions, Comm. Statist. Theory Methods, 40, 2737-2750 (2011) · Zbl 1271.62111
[20] Matsuura, S.; Kurata, H., A principal subspace theorem for 2-principal points of general location mixtures of spherically symmetric distributions, Statist. Probab. Lett., 80, 1863-1869 (2010) · Zbl 1202.62070
[21] Matsuura, S.; Kurata, H., Principal points of a multivariate mixture distribution, J. Multivariate Anal., 102, 213-224 (2011) · Zbl 1328.62077
[22] Matsuura, S.; Kurata, H., Definition and properties of \(m\)-dimensional \(n\)-principal points, Comm. Statist. Theory Methods, 42, 267-282 (2013) · Zbl 1298.62101
[23] Matsuura, S.; Kurata, H., Principal points for an allometric extension model, Statist. Papers, 55, 853-870 (2014) · Zbl 1297.62036
[24] Mease, D.; Nair, V. N., Unique optimal partitions of distributions and connections to hazard rates and stochastic ordering, Statistica Sinica, 16, 1299-1312 (2006) · Zbl 1111.62015
[25] Mease, D.; Nair, V. N.; A Sudjianto, A., Selective assembly in manufacturing: Statistical issues and optimal binning strategies, Technometrics, 46, 165-175 (2004)
[26] Petkova, E.; Tarpey, T., Partitioning of functional data for understanding heterogeneity in psychiatric conditions, Stat. Interface, 2, 413-424 (2009) · Zbl 1245.91087
[27] Pollard, D., Strong consistency of \(K\)-means clustering, Ann. Statist., 9, 135-140 (1981) · Zbl 0451.62048
[28] Pötzelberger, K.; Felsenstein, K., An asymptotic result on principal points for univariate distributions, Optimization, 28, 397-406 (1994) · Zbl 0813.62012
[30] Rowe, S., An algorithm for computing principal points with respect to a loss function in the unidimensional case, Stat. Comput., 6, 187-190 (1996)
[31] Ruwet, C.; Haesbroeck, G., Classification performance resulting from a \(2 -\) means, J. Statist. Plann. Inference, 143, 408-418 (2013) · Zbl 1254.62075
[32] Shimizu, N.; Mizuta, M., Functional clustering and functional principal points, (Apolloni, B.; Howlett, R. J.; Jain, L. C., KES 2007/ WIRN 2007, Part II. KES 2007/ WIRN 2007, Part II, LNAI, vol. 4693 (2007), Springer-Verlag), 501-508
[33] Shimizu, N.; Mizuta, M., Functional principal points and functional cluster analysis, (Jain, L. C.; Sato-Ilic, M.; Virvou, M.; Tsihrintzis, G. A.; Balas, V. E.; Abeynayake, C., Computational Intelligence Paradigms, Innovative Applications, Studies in Computational Intelligence, vol. 137 (2008), Springer-Verlag), 149-165
[34] Stampfer, E.; Stadlober, E., Methods for estimating principal points, Comm. Statist. Ser. B. Simulation Comput., 31, 261-277 (2002) · Zbl 1081.62538
[35] Tarpey, T., Two principal points of symmetric, strongly unimodal distributions, Statist. Probab. Lett., 20, 253-257 (1994) · Zbl 0799.62019
[36] Tarpey, T., Principal points and self-consistent points of symmetric multivariate distributions, J. Multivariate Anal., 53, 39-51 (1995) · Zbl 0820.62047
[37] Tarpey, T., Self-consistent patterns for symmetric multivariate distributions, J. Classification, 15, 57-79 (1998) · Zbl 0902.62057
[38] Tarpey, T., Self-consistency algorithms, J. Comput. Graph. Stat., 8, 889-905 (1999)
[39] Tarpey, T., Estimating principal points of univariate distributions, J. Appl. Stat., 24, 499-512 (1997)
[40] Tarpey, T., A parametric \(k\)-means algorithm, Comput. Statist., 22, 71-89 (2007) · Zbl 1221.62042
[41] Tarpey, T.; Flury, B., Self-consistency: A fundamental concept in statistics, Statist. Science, 11, 229-243 (1996) · Zbl 0955.62540
[42] Tarpey, T.; Ivey, C. T., Allometric extension for multivariate regression models, J. Data Sci., 4, 479-495 (2006)
[43] Tarpey, T.; Kinateder, K. K. J., Clustering functional data, J. Classification, 20, 93-114 (2003) · Zbl 1112.62327
[44] Tarpey, T.; Li, L.; Flury, B., Principal points and self-consistent points of elliptical distributions, Ann. Statist., 23, 103-112 (1995) · Zbl 0822.62042
[45] Tarpey, T.; Petkova, E., Principal point classification: Applications to differentiating drug and placebo responses in longitudinal studies, J. Statist. Plann. Inference, 40, 539-550 (2010) · Zbl 1177.62084
[46] Tarpey, T.; Petkova, E.; Lu, Y.; Govindarajulu, U., Optimal partitioning for linear mixed effects models: Applications to identifying placebo responders, J. Amer. Statist. Assoc., 105, 968-977 (2010) · Zbl 1390.62136
[47] Tarpey, T.; Petkova, E.; Ogden, R. T., Profiling placebo responders by self-consistent partitions of functional data, J. Amer. Statist. Assoc., 98, 850-858 (2003)
[48] Trushkin, A., Sufficient conditions for uniqueness of a locally optimal quantizer for a class of convex error weighting functions, IEEE Trans. Inf. Theory, 28, 187-198 (1982) · Zbl 0476.94012
[49] Yamamoto, W.; Shinozaki, N., On uniqueness of two principal points for univariate location mixtures, Statist. Probab. Lett., 46, 33-42 (2000) · Zbl 0976.62009
[50] Yamamoto, W.; Shinozaki, N., Two principal points for multivariate location mixtures of spherically symmetric distributions, J. Japan Statist. Soc., 30, 53-63 (2000) · Zbl 0963.62046
[51] Yamashita, H.; Matsuura, S.; Suzuki, H., Estimation of principal points for a multivariate binary distribution using a log-linear model, Communications in Statistics - Simulation and Computation (2015), (in press)
[52] Yamashita, H.; Suzuki, H., Heuristic approximation methods for principal points for binary distributions, J. Japan Ind. Manage. Assoc., 65, 131-141 (2014)
[53] Zoppè, A., Principal points of univariate continuous distributions, Stat. Comput., 5, 127-132 (1995)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.