×

zbMATH — the first resource for mathematics

A hybrid algorithm with cluster analysis in modelling high dimensional data. (English) Zbl 1383.62179
Summary: Multivariate data modelling aims to predict unknown function values through an established mathematical model. It is essential to construct an analytical structure using the given set of high dimensional data points with corresponding function values. The level of multivariance directly affects the modelling process. Increase in the number of independent variables makes the standard numerical methods incapable of obtaining the sought analytical structure. This work aims to overcome the difficulties of high multivariance and to improve the modelling quality by carrying out two main steps: data clustering and data partitioning. Data clustering step deals with dividing the whole problem domain into several clusters by performing k-means clustering algorithm. Data partitioning step performs the Enhanced Multivariance Product Representation method to partition the high dimensional data set of each cluster. The analytical structure is obtained through the partitioned data for each cluster and can be used to predict the unknown function values.
MSC:
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Software:
MuPAD; WEKA; AS 136; PERL
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Burden, R. L.; Faires, J. D., Numerical Analysis, (2001), Brooks/Cole CA
[2] Cesellia, A.; Colombo, F.; Cordone, R., Balanced compact clustering for efficient range queries in metric spaces, Discrete Appl. Math., 169, 43-67, (2014) · Zbl 1358.68090
[3] Deitel, H. M.; Deitel, P. J.; Nieto, T. R.; McPhie, D. C., How To Program Perl, (2001), Prentice Hall New Jersey
[4] Forgy, E., Cluster analysis of multivariate data: efficiency versus interpretability of classifications, Biometrics, 21, 768-769, (1965)
[5] Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H., The weka data mining software: an update, ACM SIGKDD Explor. Newslett., 11, 10-18, (2009)
[6] Hartigan, J. A.; Wong, M. A., Algorithm as 136: A k-means clustering algorithm, J. R. Stat. Soc. Ser. C. Appl. Stat., 28, 100-108, (1979) · Zbl 0447.62062
[7] Kalna, G.; Vass, J. K.; Higham, D. J., Multidimensional partitioning, bi-partitioning: analysis, application to gene expression data sets, Int. J. Comput. Math., 85, 475-485, (2008) · Zbl 1138.65029
[8] Kettleborough, G.; Rayward-Smith, V. J., Optimising sum-of-squares measures for clustering multisets defined over a metric space, Discrete Appl. Math., 161, 2499-2513, (2013) · Zbl 1296.68070
[9] Lloyd, S. P., Least squares quantization in pcm, IEEE Trans. Inform. Theory, 28, 129-137, (1982) · Zbl 0504.94015
[10] MacQueen, J. B., Some methods for classification, analysis of multivariate observations, (Proceedings of 5th Berkeley Symposium on Mathematical Statistics, (1967), Probability University of California Press), 281-297
[11] Oevel, W.; Postel, F.; Wehmeier, S.; Gerhard, J., The MuPAD Tutorial, (2000), Springer New York
[12] Özay, E. K.; Demiralp, M., Reductive enhanced multivariance product representation for multi-way arrays, J. Math. Chem., 52, 2546-2558, (2014) · Zbl 1331.41042
[13] Tunga, M. A., A new approach for multivariate data modelling in orthogonal geometry, Int. J. Comput. Math., 92, 2011-2021, (2015) · Zbl 1326.41007
[14] Tunga, B.; Demiralp, M., The influence of the support functions on the quality of enhanced multivariate product representation, J. Math. Chem., 48, 827-840, (2010) · Zbl 1293.62138
[15] Tunga, M. A.; Demiralp, M., A novel method for multivariate data modelling: piecewise generalized empr, J. Math. Chem., 51, 2654-2667, (2013) · Zbl 1310.92004
[16] Tuna, S.; Tunga, B., A novel piecewise multivariate function approximation method via universal matrix representation, J. Math. Chem., 51, 1784-1801, (2013) · Zbl 1312.41043
[17] Witten, I. H.; Frank, E.; Hall, M. A., Data Mining: Practical Machine Learning Tools, Techniques, (2005), Morgan Kaufmann San Francisco
[18] Wu, Zongmin.; Xiong, Zhengchao., Multivariate quasi-interpolation in \(L^p\) (\(R^d\)) with radial basis functions for scattered data, Int. J. Comput. Math., 87, 583-590, (2010) · Zbl 1184.65021
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.