Using unlabelled data to update classification rules with applications in food authenticity studies. (English) Zbl 1490.62155

Summary: An authentic food is one that is what it purports to be. Food processors and consumers need to be assured that, when they pay for a specific product or ingredient, they are receiving exactly what they pay for. Classification methods are an important tool in food authenticity studies where they are used to assign food samples of unknown type to known types. A classification method is developed where the classification rule is estimated by using both the labelled and the unlabelled data, in contrast with many classical methods which use only the labelled data for estimation. This methodology models the data as arising from a Gaussian mixture model with parsimonious covariance structure, as is done in model-based clustering. A missing data formulation of the mixture model is used and the models are fitted by using the EM and classification EM algorithms. The methods are applied to the analysis of spectra of food-stuffs recorded over the visible and near infra-red wavelength range in food authenticity studies. A comparison of the performance of model-based discriminant analysis and the method of classification proposed is given. The classification method proposed is shown to yield very good misclassification rates. The correct classification rate was observed to be as much as 15% higher than the correct classification rate for model-based discriminant analysis.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P30 Applications of statistics in engineering and industry; control charts


mclust; wavethresh
Full Text: DOI


[1] Banfield J. D., Biometrics 49 pp 803– (1993)
[2] Bensmail H., J. Am. Statist. Ass. 91 pp 1743– (1996)
[3] Biernacki C., J. Statist. Computn Simuln 14 pp 49– (1999)
[4] DOI: 10.1007/BF01720593 · Zbl 0802.62017
[5] DOI: 10.1016/0167-9473(92)90042-E · Zbl 0937.62605
[6] DOI: 10.1016/0031-3203(94)00125-6 · Zbl 05480211
[7] Daubechies I., Communs Pure Appl. Math. 41 pp 909– (1988)
[8] Daubechies I., Ten Lectures on Wavelets (1992) · Zbl 0776.42018
[9] Dempster A. P., J. R. Statist. Soc. 39 pp 1– (1977)
[10] Donoho D. L., Biometrika 81 pp 425– (1994)
[11] DOI: 10.1366/000370203321535060
[12] DOI: 10.1093/comjnl/41.8.578 · Zbl 0920.68038
[13] DOI: 10.1007/s003579900058 · Zbl 0951.91500
[14] DOI: 10.1198/016214502760047131 · Zbl 1073.62545
[15] DOI: 10.1007/s00357-003-0015-3 · Zbl 1055.62071
[16] Ganesalingam S., Biometrika 65 pp 658– (1978)
[17] Hartigan J. A., Clustering Algorithms (1975) · Zbl 0372.62040
[18] DOI: 10.1109/34.192463 · Zbl 0709.94650
[19] McElhinney J., J. Nr Infrared Spectrosc. 7 pp 145– (1999)
[20] McLachlan G. J., J. Am. Statist. Ass. 70 pp 365– (1975)
[21] McLachlan G. J., J. Am. Statist. Ass. 72 pp 403– (1977)
[22] McLachlan G. J., Discriminant Analysis and Statistical Pattern Recognition (1992) · Zbl 1108.62317
[23] McLachlan G. J., Mixture Models: Inference and Applications to Clustering (1988) · Zbl 0697.62050
[24] McLachlan G. J., Communs Statist. Simuln Computn 11 pp 753– (1982)
[25] McLachlan G. J., Finite Mixture Models (2000) · Zbl 0963.62061
[26] Naes T., A User-friendly Guide to Multivariate Calibration and Classification (2002)
[27] G. P. Nason (1993 ) The wavethresh package: wavelet transform and thresholding software for S . University of Bristol, Bristol.
[28] Nason G. P., J. Comput. Graph. Statist. 3 pp 163– (1994)
[29] Ogden R. T., Essential Wavelets for Statistical Applications and Data Analysis (1997) · Zbl 0868.62033
[30] O’Neill T. J., J. Am. Statist. Ass. 73 pp 821– (1978)
[31] Osborne B. G., Practical NIR Spectroscopy with Applications in Food and Beverage Analysis (1993)
[32] Titterington D. M., Appl. Statist. 25 pp 238– (1976)
[33] Titterington D. M., Statistical Analysis of Finite Mixture Distributions (1985) · Zbl 0646.62013
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.