Principal covariates clusterwise regression (PCCR): accounting for multicollinearity and population heterogeneity in hierarchically organized data.

*(English)*Zbl 1360.62537Summary: In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea’s behind PCovR [S. de Jong and H. A. L. Kiers, “Principal covariates regression”, Chemom. Intell. Lab. Syst. 14, No. 1–3, 155–164 (1992; doi:10.1016/0169-7439(92)80100-i)] and CR [H. Späth, Computing 22, 367–373 (1979; Zbl 0387.65028)]. The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.

##### MSC:

62P15 | Applications of statistics to psychology |

62H25 | Factor analysis and principal components; correspondence analysis |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

##### Keywords:

clusterwise regression; component analysis; multicollinearity; population heterogeneity; hierarchically organized data
PDF
BibTeX
XML
Cite

\textit{T. F. Wilderjans} et al., Psychometrika 82, No. 1, 86--111 (2017; Zbl 1360.62537)

Full Text:
DOI

##### References:

[1] | Arminger, G; Stein, P, Finite mixtures of covariance structure models with regressors: loglikelihood function, minimum distance estimation, fit indices, and a complex example, Sociological Methods & Research, 26, 148-182, (1997) |

[2] | Brusco, MJ; Cradit, JD, A variable selection heuristic for K-means clustering, Psychometrika, 66, 249-270, (2001) · Zbl 1293.62237 |

[3] | Brusco, MJ; Cradit, JD; Steinley, D; Fox, GL, Cautionary remarks on the use of clusterwise regression, Multivariate Behavioral Research, 43, 29-49, (2008) |

[4] | Brusco, MJ; Cradit, JD; Tashchian, A, Multicriterion clusterwise regression for joint segmentation settings: an application to customer value, Journal of Marketing Research, 40, 225-234, (2003) |

[5] | Ceulemans, E; Kiers, HAL, Discriminating between strong and weak structures in three-mode principal component analysis, British Journal of Mathematical & Statistical Psychology, 62, 601-620, (2009) |

[6] | Ceulemans, E; Kuppens, P; Mechelen, I, Capturing the structure of distinct types of individual differences in the situation-specific experience of emotions: the case of anger, European Journal of Personality, 26, 484-495, (2012) |

[7] | Ceulemans, E; Mechelen, I, CLASSI: A classification model for the study of sequential processes and individual differences therein, Psychometrika, 73, 107-124, (2008) · Zbl 1143.62092 |

[8] | Ceulemans, E; Mechelen, I; Leenen, I, The local minima problem in hierarchical classes analysis: an evaluation of a simulated annealing algorithm and various multistart procedures, Psychometrika, 72, 377-391, (2007) · Zbl 1286.62102 |

[9] | Cohen, J, A power primer, Psychological Bulletin, 112, 155-159, (1992) |

[10] | Coxe, KL; Kotz, S (ed.); Johnson, NL (ed.); Read, CB (ed.), Principal components regression analysis, 181-184, (1986), New York |

[11] | Jong, S; Kiers, HAL, Principal covariates regression: part I. theory, Chemometrics and Intelligent Laboratory Systems, 14, 155-164, (1992) |

[12] | Roover, K; Ceulemans, E; Timmerman, ME; Vansteelandt, K; Stouten, J; Onghena, P, Clusterwise simultaneous component analysis for analyzing structural differences in multivariate multiblock data, Psychological Methods, 17, 100-119, (2012) |

[13] | DeSarbo, WS; Cron, WL, A maximum likelihood methodology for clusterwise linear regression, Journal of Classification, 5, 249-282, (1988) · Zbl 0692.62052 |

[14] | DeSarbo, WS; Edwards, EA, Typologies of compulsive buying behavior: A constrained clusterwise regression approach, Journal of Consumer Psychology, 5, 231-262, (1996) |

[15] | DeSarbo, WS; Oliver, RL; Rangaswamy, A, A simulated annealing methodology for clusterwise linear regression, Psychometrika, 54, 707-736, (1989) |

[16] | Hahn, C; Johnson, MD; Herrmann, A; Huber, F, Capturing customer heterogeneity using a finite mixture PLS approach, Schmalenbach Business Review, 54, 243-269, (2002) |

[17] | Hubert, L; Arabie, P, Comparing partitions, Journal of Classification, 2, 193-218, (1985) |

[18] | Kaiser, HF, The varimax criterion for analytic rotation in factor analysis, Psychometrika, 23, 187-200, (1958) · Zbl 0095.33603 |

[19] | Kiers, H. A. L. (1989). Three-way methods for the analysis of qualitative and quantitative two-way data. Leiden: DSWO Press. |

[20] | Kiers, HAL; Smilde, A, A comparison of various methods for multivariate regression with highly collinear variables, Statistical Methods & Applications, 16, 193-228, (2007) · Zbl 1405.62096 |

[21] | Kiers, HAL; Berge, JMF, Minimization of a class of matrix trace functions by means of refined majorization, Psychometrika, 57, 371-382, (1992) · Zbl 0782.62067 |

[22] | Korth, B; Tucker, LR, The distribution of chance congruence coefficients from simulated data, Psychometrika, 40, 361-372, (1975) · Zbl 0311.92027 |

[23] | Kroonenberg, P. M. (2008). Applied multiway data analysis. Hoboken, NJ: Wiley. · Zbl 1160.62002 |

[24] | Kroonenberg, P. M. (1983). Three-mode principal component analysis: Theory and applications. Leiden: DSWO Press. |

[25] | Kuppens, P; Ceulemans, E; Timmerman, ME; Diener, E; Kim-Prieto, C, Universal intracultural and intercultural dimensions of the recalled frequency of emotional experience, Journal of Cross-Cultural Psychology, 37, 491-515, (2006) |

[26] | Leisch, F, Flexmix: A general framework for finite mixture models and latent class regression in R, Journal of Statistical Software, 11, 1-18, (2004) |

[27] | Roa, CR, The use and interpretation of principal component analysis in applied research, Sankhyā: The Indian Journal of Statistics, Series A, 26, 329-358, (1964) · Zbl 0137.37207 |

[28] | Sarstedt, M; Ringle, CM, Treating unobserved heterogeneity in PLS path modeling: a comparison of FIMIX-PLS with different data analysis strategies, Journal of Applied Statistics, 37, 1299-1318, (2010) |

[29] | Schott, J. R. (2005). Matrix analysis for statistics (2nd ed.). Hoboken, NJ: Wiley. · Zbl 1076.15002 |

[30] | Späth, H, Algorithm 39: clusterwise linear regression, Computing, 22, 367-373, (1979) · Zbl 0387.65028 |

[31] | Späth, H, Correction to algorithm 39: clusterwise linear regression, Computing, 26, 275-275, (1981) · Zbl 0444.65020 |

[32] | Steinley, D, Local optima in K-means clustering: what you don’t know may hurt you, Psychological Methods, 8, 294-304, (2003) |

[33] | Steinley, D, Properties of the hubert-arabie adjusted rand index, Psychological Methods, 9, 386-396, (2004) |

[34] | Stormshak, E. A., Bierman, K. L., Bruschi, C., Dodge, K. A., & Coie, J. D., The Conduct Problems Prevention Research Group. (1999). The relation between behavior problems and peer preference in different classroom contexts. Child Development, 70(1), 169-182. |

[35] | Berge, JMF, Orthogonal procrustes rotation for two or more matrices, Psychometrika, 42, 267-276, (1977) · Zbl 0362.92020 |

[36] | Tucker, L. R. (1951). A method for synthesis of factor analysis studies. Personnel Research Section Rapport #984. Washington, DC: Department of the Army. |

[37] | Berg, RA; Hoefsloot, HCJ; Westerhuis, JA; Smilde, AK; Werf, MJ, Centering, scaling and transformations: improving the biological information content of metabolomics data, BMC Genomics, 7, 142-157, (2006) |

[38] | Berg, RA; Mechelen, I; Wilderjans, TF; Deun, K; Kiers, HAL; Smilde, AK, Integrating functional genomics data using maximum likelihood based simultaneous component analysis, BMC Bioinformatics, 10, 340, (2009) |

[39] | Deun, K; Smilde, AK; Werf, MJ; Kiers, HAL; Mechelen, I, A structured overview of simultaneous component based data integration, BMC Bioinformatics, 10, 246, (2009) |

[40] | Vervloet, M; Deun, K; Noortgate, W; Ceulemans, E, On the selection of the weighting parameter value in principal covariates regression, Chemometrics and Intelligent Laboratory Systems, 123, 36-43, (2013) |

[41] | Wedel, M; DeSarbo, WS, A mixture likelihood approach for generalized linear models, Journal of Classification, 12, 21-55, (1995) · Zbl 0825.62611 |

[42] | Wilderjans, TF; Ceulemans, E, Clusterwise parafac to identify heterogeneity in three-way data, Chemometrics and Intelligent Laboratory Systems, 129, 87-97, (2013) |

[43] | Wilderjans, TF; Ceulemans, E; Kuppens, P, Clusterwise HICLAS: A generic modeling strategy to trace similarities and differences in multi-block binary data, Behavior Research Methods, 44, 532-545, (2012) |

[44] | Wilderjans, TF; Ceulemans, E; Mechelen, I, Simultaneous analysis of coupled data blocks differing in size: A comparison of two weighting schemes, Computational Statistics and Data Analysis, 53, 1086-1098, (2009) · Zbl 05687828 |

[45] | Wilderjans, TF; Ceulemans, E; Mechelen, I; Berg, RA, Simultaneous analysis of coupled data matrices subject to different amounts of noise, British Journal of Mathematical and Statistical Psychology, 64, 277-290, (2011) |

[46] | Wold, H; Krishnaiah, PR (ed.), Estimation of principal component and related methods by iterative least squares, 391-420, (1966), New York |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.