×

Weakly supervised clustering: learning fine-grained signals from coarse labels. (English) Zbl 1397.62234

Summary: Consider a classification problem where we do not have access to labels for individual training examples, but only have average labels over subpopulations. We give practical examples of this setup and show how such a classification task can usefully be analyzed as a weakly supervised clustering problem. We propose three approaches to solving the weakly supervised clustering problem, including a latent variables model that performs well in our experiments. We illustrate our methods on an analysis of aggregated elections data and an industry data set that was the original motivation for this research.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P30 Applications of statistics in engineering and industry; control charts

Software:

PRMLT; mclust; bootstrap

References:

[1] Agresti, A. (2002). Categorical Data Analysis , 2nd ed. Wiley, New York. · Zbl 1018.62002
[2] Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Comput. 7 108-116.
[3] Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern Recognition and Machine Learning . Springer, New York. · Zbl 1107.68072
[4] Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993-1022. · Zbl 1112.68379 · doi:10.1162/jmlr.2003.3.4-5.993
[5] Bucklin, R. E. and Sismeiro, C. (2009). Click here for Internet insight: Advances in clickstream data analysis in marketing. J. Interact. Market 23 35-48.
[6] Copas, J. B. (1988). Binary regression models for contaminated data. J. Roy. Statist. Soc. Ser. B 50 225-265.
[7] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022
[8] Efron, B. (1983). Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Amer. Statist. Assoc. 78 316-331. · Zbl 0543.62079 · doi:10.2307/2288636
[9] Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap . CRC press, Boca Raton, FL. · Zbl 0835.62038
[10] Fraley, C., Raftery, A. E., Murphy, T. B. and Scrucca, L. (2012). MCLUST version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical report.
[11] Gordon, A. D. (1999). Classification . Chapman & Hall, London. · Zbl 0929.62068
[12] Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42 177-196. · Zbl 0970.68130 · doi:10.1023/A:1007617005950
[13] Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30-37. · doi:10.1198/0003130042836
[14] Kück, H. and de Freitas, N. (2005). Learning about individuals from group statistics. In Proceedings of the 21 st Conference on Uncertainty in Artificial Intelligence 332-339. AUAI Press, Arlington, VA.
[15] Lange, K., Hunter, D. R. and Yang, I. (2000). Optimization transfer using surrogate objective functions. J. Comput. Graph. Statist. 9 1-59.
[16] Levy, S. (2011). In the Plex : How Google Thinks , Works , and Shapes Our Lives. Simon and Schuster, New York.
[17] Magder, L. S. and Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology 146 195-203.
[18] Politis, D. N., Romano, J. P. and Wolf, M. (1999). Subsampling . Springer, New York. · Zbl 0931.62035
[19] Quadrianto, N., Smola, A. J., Caetano, T. S. and Le, Q. V. (2009). Estimating labels from label proportions. J. Mach. Learn. Res. 10 2349-2374. · Zbl 1235.68182
[20] Rueping, S. (2010). SVM classifier estimation from group probabilities. In Proceedings of the 27 th International Conference on Machine Learning 911-918.
[21] Sculley, D., Malkin, R. G., Basu, S. and Bayardo, R. J. (2009). Predicting bounce rates in sponsored search advertisements. In Proceedings of the 15 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1325-1334. ACM, New York.
[22] Surdeanu, M., Tibshirani, J., Nallapati, R. and Manning, C. D. (2012). Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning 455-465. Association for Computational Linguistics, Stroudsburg, PA.
[23] Täckström, O. and McDonald, R. (2011a). Discovering fine-grained sentiment with latent variable structured prediction models. In Advances in Information Retrieval 368-374. Springer, Berlin.
[24] Täckström, O. and McDonald, R. (2011b). Semi-supervised latent variable models for sentence-level sentiment analysis. In Proceedings of the 49 th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies. Short Papers , Volume 2 569-574. Association for Computational Linguistics, Stroudsburg, PA.
[25] Toutanova, K. and Johnson, M. (2007). A Bayesian LDA-based model for semi-supervised part-of-speech tagging. In Advances in Neural Information Processing Systems 1521-1528. Curran Associates, Red Hook, NY.
[26] van der Maaten, L., Chen, M., Tyree, S. and Weinberger, K. Q. (2013). Learning with marginalized corrupted features. In Proceedings of the 30 th International Conference on Machine Learning 410-418.
[27] Wager, S., Wang, S. and Liang, P. (2013). Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems . Curran Associates, Red Hook, NY.
[28] Xing, E. P., Jordan, M. I., Russell, S. and Ng, A. (2002). Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems 505-512. Curran Associates, Red Hook, NY.
[29] Xu, G., Yang, S.-H. and Li, H. (2009). Named entity mining from click-through data using weakly supervised latent Dirichlet allocation. In Proceedings of the 15 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1365-1374. ACM, New York.
[30] Yasui, Y., Pepe, M., Hsu, L., Adam, B.-L. and Feng, Z. (2004). Partially supervised learning using an EM-boosting algorithm. Biometrics 60 199-206. · Zbl 1130.62389 · doi:10.1111/j.0006-341X.2004.00156.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.