×

Efficient histogram dictionary learning for text/image modeling and classification. (English) Zbl 1411.68111

Summary: In dealing with text or image data, it is quite effective to represent them as histograms. In modeling histograms, although recent Bayesian topic models such as latent Dirichlet allocation and its variants are shown to be successful, they often suffer from computational overhead for inference of a large number of hidden variables. In this paper we consider a different modeling strategy of forming a dictionary of base histograms whose convex combination yields a histogram of observable text/image document. The dictionary entries are learned from data, which establishes direct/indirect association between specific topics/keywords and the base histograms. From a learned dictionary, the coding of an observed histogram can provide succinct and salient information useful for classification. One of our main contributions is that we propose a very efficient dictionary learning algorithm based on the recent Nesterov’s smooth optimization technique in conjunction with analytic solution methods for quadratic minimization sub-problems. Not alone the faster theoretical convergence rate, also in real time, our algorithm is 20–30 times faster than general-purpose optimizers such as interior-point methods. In classification/annotation tasks on several text/image datasets, our approach exhibits comparable or often superior performance to existing Bayesian models, while significantly faster than their variational inference.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aharon M, Elad M, Bruckstein AM (2005) K-svd and its non-negative variant for dictionary design. In: Proceedings of the SPIE conference wavelets, pp 327-339
[2] Asuncion A, Newman D (2007) UCI machine learning repository
[3] Bach F, Mairal J, Ponce J (2012) Task-driven dictionary learning. IEEE Trans Pattern Anal Mach Intell 34(4):791-804 · doi:10.1109/TPAMI.2011.156
[4] Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: International conference on image processing · Zbl 1039.68595
[5] Bayón L, Grau JM, Suárez PM (2002) A new formulation of the equivalent thermal in optimization of hydrothermal systems. Math Prob Eng 8(3):181-196 · Zbl 1130.80304 · doi:10.1080/10241230215284
[6] Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183-202 · Zbl 1175.94009 · doi:10.1137/080716542
[7] Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR conference
[8] Blei D, McAuliffe J (2007) Supervised topic models. In: Neural information processing systems
[9] Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993-1022 · Zbl 1112.68379
[10] Bolovinou A, Pratikakis I, Perantonis S (2012) Bag of spatio-visual words for context inference in scene classification. Pattern Recognit. doi:10.1016/j.patcog.2012.07.024
[11] Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision
[12] Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge · Zbl 1058.90049 · doi:10.1017/CBO9780511804441
[13] Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33-61 · Zbl 0919.94002 · doi:10.1137/S1064827596304010
[14] Coates A, Lee H, Ng AY (2011) An analysis of single layer networks in unsupervised feature learning. In: International conference on Artificial Intelligence and Statistics (AISTATS)
[15] Coleman TF, Li Y (1996) A reflective Newton method for minimizing a quadratic function subject to bounds on some of the variables. SIAM J Optim 6(4):1040-1058 · Zbl 0861.65053 · doi:10.1137/S1052623494240456
[16] CVX Research Inc. (2012) CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx
[17] Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407-499 · Zbl 1091.62054 · doi:10.1214/009053604000000067
[18] Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: IEEE international conference on computer vision and pattern recognition
[19] Gill PE, Murray W, Wright MH (1981) Pract Optim. Academic Press, London
[20] Grant M, Boyd S (2008) Graph implementations for nonsmooth convex programs., Recent Advances in Learning and ControlSpringer, London, pp 95-110 · Zbl 1205.90223
[21] Ho ND, Dooren PV (2008) Non-negative matrix factorization with fixed row and column sums. Linear Algebra Appl 429(5-6):1020-1025 · Zbl 1148.15010 · doi:10.1016/j.laa.2007.02.026
[22] Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence · Zbl 0970.68130
[23] Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457-1469 · Zbl 1222.68218
[24] Kiros R, Szepesvári C (2012) Deep representations and codes for image auto-annotation. In: Advances in Neural Information Processing Systems (NIPS)
[25] Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ (2003) Dictionary learning algorithms for sparse representation. Neural Comput 15(2):349-396 · Zbl 1047.68101 · doi:10.1162/089976603762552951
[26] Li LJ, Fei-Fei L (2007) What, where and who? classifying event by scene and object recognition. In: IEEE International Conference on Computer Vision
[27] Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Computer Vision 60(2):91-110 · doi:10.1023/B:VISI.0000029664.99615.94
[28] Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: International conference on machine learning · Zbl 1242.62087
[29] Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Prog 103(1):127-152 · Zbl 1079.90102 · doi:10.1007/s10107-004-0552-5
[30] Osborne MR, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. IMA J Numer Anal 20(3):389-403 · Zbl 0962.65036 · doi:10.1093/imanum/20.3.389
[31] Pele O, Werman M (2010) The quadratic-chi histogram distance family. In: European conference on computer vision
[32] Perkins S, Theiler J (2003) Online feature selection using grafting. In: International conference on machine learning (ICML) · Zbl 1102.68578
[33] Platt, J.; Smola, AJ (ed.); Barlett, P. (ed.); Schölkopf, B. (ed.); Schuurmans, D. (ed.), Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods (1999), Cambridge
[34] Polyak BT (1987) Introduction to optimization. Optimization Software Inc., New York · Zbl 0708.90083
[35] Rosset S (2004) Tracking curved regularized optimization solution paths. In: In Advances in Neural Information Processing Systems. MIT Press
[36] Rubinstein R, Bruckstein A, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045-1057 · doi:10.1109/JPROC.2010.2040551
[37] Rubner Y, Tomasi C, Guibas L (2000) The earth mover’s distance as a metric for image retrieval. Int J Computer Vision 40(2):99-121 · Zbl 1012.68705 · doi:10.1023/A:1026543900054
[38] Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vision 77(1-3):157-173 · doi:10.1007/s11263-007-0090-8
[39] Sindhwani V, Ghoting A (2012) Large-scale distributed non-negative sparse coding and sparse dictionary learning. In: International conference on knowledge discovery and data mining
[40] Thurau C, Kersting K, Bauckhage C (2009) Convex non-negative matrix factorization in the wild. In: International conference on data mining · Zbl 1235.62002
[41] Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267-288 · Zbl 0850.62538
[42] Tosic I, Frossard P (2011) Dictionary learning. IEEE Signal Proc Mag 28(2):27-38 · doi:10.1109/MSP.2010.939537
[43] Wang C, Blei DM, Fei-Fei L (2009) Simultaneous image classification and annotation. In: IEEE international conference on computer vision and pattern recognition
[44] Wang Y, Jia Y, Hu C, Turk M (2004) Fisher non-negative matrix factorization for learning local features. In: Asian conference on computer vision
[45] Yang AY, Zhou Z, Ganesh A, Sastry SS, Ma Y (2013) Fast \[l_1\] l1-minimization algorithms for robust face recognition. IEEE Trans Image Proc 22(8):3234-3246 · doi:10.1109/TIP.2013.2262292
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.