Seemingly unrelated clusterwise linear regression.

*(English)*Zbl 07252404Summary: Linear regression models based on finite Gaussian mixtures represent a flexible tool for the analysis of linear dependencies in multivariate data. They are suitable for dealing with correlated response variables when data come from a heterogeneous population composed of two or more sub-populations, each of which is characterised by a different linear regression model. Several types of finite mixtures of linear regression models have been specified by changing the assumptions on the parameters that differentiate the sub-populations and/or the vectors of regressors that affect the response variables. They are made more flexible in the class of models defined by mixtures of seemingly unrelated Gaussian linear regressions illustrated in this paper. With these models, the researcher is enabled to use a different vector of regressors for each dependent variable. The proposed class includes parsimonious models obtained by imposing suitable constraints on the variances and covariances of the response variables in the sub-populations. Details about the model identification and maximum likelihood estimation are given. The usefulness of these models is shown through the analysis of a real dataset. Regularity conditions for the model class are illustrated and a proof is provided that, when these conditions are met, the consistency of the maximum likelihood estimator under the examined models is ensured. In addition, the behaviour of this estimator in the presence of finite samples is numerically evaluated through the analysis of simulated datasets.

##### MSC:

62J05 | Linear regression; mixed models |

62H12 | Estimation in multivariate analysis |

62F12 | Asymptotic properties of parametric estimators |

PDF
BibTeX
XML
Cite

\textit{G. Galimberti} and \textit{G. Soffritti}, Adv. Data Anal. Classif., ADAC 14, No. 2, 235--260 (2020; Zbl 07252404)

Full Text:
DOI

##### References:

[1] | Aitkin, M.; Francis, B.; Hinde, J.; Darnell, R., Statistical modelling in R (2009), New York: Oxford University Press, New York · Zbl 1211.62003 |

[2] | Aitkin, M.; Tunnicliffe Wilson, G., Mixture models, outliers, and the EM algorithm, Technometrics, 22, 325-331 (1980) · Zbl 0466.62034 |

[3] | Baird, IG; Quastel, N., Dolphin-safe tuna from California to Thailand: localisms in environmental certification of global commodity networks, Ann Assoc Am Geogr, 101, 337-355 (2011) |

[4] | Bartolucci, F.; Scaccia, L., The use of mixtures for dealing with non-normal regression errors, Comput Stat Data Anal, 48, 821-834 (2005) · Zbl 1429.62284 |

[5] | Cadavez, VAP; Hennningsen, A., The use of seemingly unrelated regression (SUR) to predict the carcass composition of lambs, Meat Sci, 92, 548-553 (2012) |

[6] | Celeux, G.; Govaert, G., Gaussian parsimonious clustering models, Pattern Recognit, 28, 781-793 (1995) |

[7] | Chevalier, JA; Kashyap, AK; Rossi, PE, Why don’t prices rise during periods of peak demand? Evidence from scanner data, Am Econ Rev, 93, 15-37 (2003) |

[8] | Dang, UJ; McNicholas, PD; Morlini, I.; Minerva, T.; Vichi, M., Families of parsimonious finite mixtures of regression models, Advances in statistical models for data analysis, 73-84 (2015), Cham: Springer, Cham |

[9] | Day, NE, Estimating the components of a mixture of normal distributions, Biometrika, 56, 463-474 (1969) · Zbl 0183.48106 |

[10] | Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood for incomplete data via the EM algorithm, J R Stat Soc B, 39, 1-22 (1977) |

[11] | De Sarbo, WS; Cron, WL, A maximum likelihood methodology for clusterwise linear regression, J Classif, 5, 249-282 (1988) · Zbl 0692.62052 |

[12] | De Veaux, RD, Mixtures of linear regressions, Comput Stat Data Anal, 8, 227-245 (1989) · Zbl 0726.62109 |

[13] | Ding, C., Using regression mixture analysis in educational research, Pract Assess Res Eval, 11, 1-11 (2006) |

[14] | Donnelly, WA, The regional demand for petrol in Australia, Econ Rec, 58, 317-327 (1982) |

[15] | Dyer, WJ; Pleck, J.; McBride, B., Using mixture regression to identify varying effects: a demonstration with paternal incarceration, J Marriage Fam, 74, 1129-1148 (2012) |

[16] | Elhenawy, M.; Rakha, H.; Chen, H.; Helfert, M.; Klein, C.; Donnellan, B.; Gusikhin, O., An automatic traffic congestion identification algorithm based on mixture of linear regressions, Smart cities, green technologies, and intelligent transport systems, 242-256 (2017), Cham: Springer, Cham |

[17] | Fraley, C.; Raftery, AE, Model-based clustering, discriminant analysis and density estimation, J Am Stat Assoc, 97, 611-631 (2002) · Zbl 1073.62545 |

[18] | Frühwirth-Schnatter, S., Finite mixture and Markov switching models (2006), New York: Springer, New York · Zbl 1108.62002 |

[19] | Galimberti, G.; Scardovi, E.; Soffritti, G., Using mixtures in seemingly unrelated linear regression models with non-normal errors, Stat Comput, 26, 1025-1038 (2016) · Zbl 06652993 |

[20] | Giles, S.; Hampton, P., Regional production relationships during the industrialization of New Zealand, 1935-1948, Reg Sci, 24, 519-533 (1984) |

[21] | Grün, B.; Leisch, F., FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters, J Stat Softw, 28, 4, 1-35 (2008) |

[22] | Hennig, C., Identifiability of models for clusterwise linear regression, J Classif, 17, 273-296 (2000) · Zbl 1017.62058 |

[23] | Henningsen, A.; Hamann, JD, systemfit: a package for estimating systems of simultaneous equations in R, J Stat Softw, 23, 4, 1-40 (2007) |

[24] | Hosmer, DW, Maximum likelihood estimates of the parameters of a mixture of two regression lines, Commun Stat Theory Methods, 3, 995-1006 (1974) · Zbl 0294.62085 |

[25] | Ingrassia, S.; Rocci, R., Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints, Comput Stat Data Anal, 55, 1715-1725 (2011) · Zbl 1328.65030 |

[26] | Jones, PN; McLachlan, GJ, Fitting finite mixture models in a regression context, Aust J Stat, 34, 233-240 (1992) |

[27] | Keshavarzi S, Ayatollahi SMT, Zare N, Pakfetrat M (2012) Application of seemingly unrelated regression in medical data with intermittently observed time-dependent covariates. Comput Math Methods Med 2012, 821643 · Zbl 1303.92013 |

[28] | Kiefer, J.; Wolfowitz, J., Consistency of the maximum likelihood estimator in the presence of infinitely many nuisance parameters, Ann Math Stat, 27, 887-906 (1956) · Zbl 0073.14701 |

[29] | Lehmann, EL, Elements of large-sample theory (1999), New York: Springer, New York · Zbl 0914.62001 |

[30] | Magnus, JR; Neudecker, H., Matrix differential calculus with applications in statistics and econometrics (1988), New York: Wiley, New York · Zbl 0651.15001 |

[31] | Maugis, C.; Celeux, G.; Martin-Magniette, M-L, Variable selection for clustering with Gaussian mixture models, Biometrics, 65, 701-709 (2009) · Zbl 1172.62021 |

[32] | McDonald, SE; Shin, S.; Corona, R., Children exposed to intimate partner violence: identifying differential effects of family environment on children’s trauma and psychopathology symptoms through regression mixture models, Child Abus Negl, 58, 1-11 (2016) |

[33] | McLachlan, GJ; Peel, D., Finite mixture models (2000), New York: Wiley, New York |

[34] | Newey, WK; McFadden, D.; Griliches, Z.; Engle, R.; Intriligator, MD; McFadden, D., Large sample estimation and hypothesis testing, Handbook of econometrics, 2111-2245 (1994), Amsterdam: Elsevier, Amsterdam |

[35] | Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2017) nlme: linear and nonlinear mixed effects models. R package version 3.1-131 |

[36] | Quandt, RE; Ramsey, JB, Estimating mixtures of normal distributions and switching regressions, J Am Stat Assoc, 73, 730-738 (1978) · Zbl 0401.62024 |

[37] | R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org |

[38] | Rocci, R.; Gattone, SA; Di Mari, R., A data driven equivariant approach to constrained Gaussian mixture modeling, Adv Data Anal Classif, 12, 235-260 (2018) · Zbl 1414.62269 |

[39] | Rossi PE (2012) bayesm: Bayesian inference for marketing/micro-econometrics. R package version 2.2-5. http://CRAN.R-project.org/package=bayesm |

[40] | Rossi, PE; Allenby, GM; McCulloch, R., Bayesian statistics and marketing (2005), Chichester: Wiley, Chichester |

[41] | Schwarz, G., Estimating the dimension of a model, Ann Stat, 6, 461-464 (1978) · Zbl 0379.62005 |

[42] | Scrucca, L.; Fop, M.; Murphy, TB; Raftery, AE, mclust5: clustering, classification and density estimation using Gaussian finite mixture models, R J, 8, 1, 205-223 (2017) |

[43] | Soffritti, G.; Galimberti, G., Multivariate linear regression with non-normal errors: a solution based on mixture models, Stat Comput, 21, 523-536 (2011) · Zbl 1221.62106 |

[44] | Srivastava, VK; Giles, DEA, Seemingly unrelated regression equations models (1987), New York: Marcel Dekker, New York |

[45] | Tashman, A.; Frey, RJ, Modeling risk in arbitrage strategies using finite mixtures, Quant Finance, 9, 495-503 (2009) · Zbl 1278.91153 |

[46] | Turner, TR, Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions, Appl Stat, 49, 371-384 (2000) · Zbl 0971.62076 |

[47] | Van Horn, ML; Jaki, T.; Masyn, K., Evaluating differential effects using regression interactions and regression mixture models, Educ Psychol Meas, 75, 677-714 (2015) |

[48] | White, EN; Hewings, GJD, Space-time employment modelling: some results using seemingly unrelated regression estimators, J Reg Sci, 22, 283-302 (1982) |

[49] | Yao, W., Label switching and its solutions for frequentist mixture models, J Stat Comput Simul, 85, 1000-1012 (2015) · Zbl 1457.62030 |

[50] | Zellner, A., An efficient method of estimating seemingly unrelated regression equations and testst for aggregation bias, J Am Stat Assoc, 57, 348-368 (1962) · Zbl 0113.34902 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.