Algorithms for nonnegative matrix factorization with the Kullback-Leibler divergence. (English) Zbl 1472.65054

Summary: Nonnegative matrix factorization (NMF) is a standard linear dimensionality reduction technique for nonnegative data sets. In order to measure the discrepancy between the input data and the low-rank approximation, the Kullback-Leibler (KL) divergence is one of the most widely used objective function for NMF. It corresponds to the maximum likehood estimator when the underlying statistics of the observed data sample follows a Poisson distribution, and KL NMF is particularly meaningful for count data sets, such as documents. In this paper, we first collect important properties of the KL objective function that are essential to study the convergence of KL NMF algorithms. Second, together with reviewing existing algorithms for solving KL NMF, we propose three new algorithms that guarantee the non-increasingness of the objective function. We also provide a global convergence guarantee for one of our proposed algorithms. Finally, we conduct extensive numerical experiments to provide a comprehensive picture of the performances of the KL NMF algorithms.


65F99 Numerical linear algebra
15A23 Factorization of matrices
Full Text: DOI arXiv


[1] Ang, AMS; Gillis, N., Accelerating nonnegative matrix factorization algorithms using extrapolation, Neural Comput., 31, 2, 417-439 (2019)
[2] Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A., Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality, Math. Oper. Res., 35, 2, 438-457 (2010) · Zbl 1214.65036
[3] Bauschke, HH; Bolte, J.; Teboulle, M., A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications, Math. Oper. Res., 42, 2, 330-348 (2017) · Zbl 1364.90251
[4] Bioucas-Dias, J.M., Plaza, A., Dobigeon, N., Parente, M., Du, Q., Gader, P., Chanussot, J.: Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches. IEEE J. Select. Top. Appl. Earth Obs. Remote Sens. 5(2), 354-379 (2012)
[5] Bolte, J.; Sabach, S.; Teboulle, M., Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program., 146, 1, 459-494 (2014) · Zbl 1297.90125
[6] Carbonetto, P., Luo, K., Dey, K., Hsiao, J., Stephens, M.: fastTopics: fast algorithms for fitting topic models and non-negative matrix factorizations to count data (2021). R package version 0.4-11. https://github.com/stephenslab/fastTopics
[7] Chambolle, A.; Pock, T., A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vis., 40, 1, 120-145 (2011) · Zbl 1255.68217
[8] Chi, EC; Kolda, TG, On tensors, sparsity, and nonnegative factorizations, SIAM J. Matrix Anal. Appl., 33, 4, 1272-1299 (2012) · Zbl 1262.15029
[9] Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.: Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley (2009)
[10] Dey, KK; Hsiao, CJ; Stephens, M., Visualizing the structure of RNA-SEQ expression data using grade of membership models, PLoS Genet., 13, 3, e1006599 (2017)
[11] Ding, C.; Li, T.; Peng, W., On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing, Comput. Stat. Data Anal., 52, 8, 3913-3927 (2008) · Zbl 1452.62053
[12] Dolan, ED; Moré, JJ, Benchmarking optimization software with performance profiles, Math. Program., 91, 2, 201-213 (2002) · Zbl 1049.90004
[13] Févotte, C.; Bertin, N.; Durrieu, JL, Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis, Neural Comput., 21, 3, 793-830 (2009) · Zbl 1156.94306
[14] Févotte, C., Cemgil, A.T.: Nonnegative matrix factorizations as probabilistic inference in composite models. In: 2009 17th European Signal Processing Conference, pp. 1913-1917 (2009)
[15] Févotte, C.; Idier, J., Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence, Neural Comput., 23, 9, 2421-2456 (2011) · Zbl 1231.65072
[16] Finesso, L.; Spreij, P., Nonnegative matrix factorization and I-divergence alternating minimization, Linear Algebra Appl., 416, 2, 270-287 (2006) · Zbl 1109.15007
[17] Fu, X.; Huang, K.; Sidiropoulos, ND; Ma, WK, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, IEEE Signal Process. Mag., 36, 2, 59-80 (2019)
[18] Gillis, N.; Suykens, J.; Signoretto, M.; Argyriou, A., The why and how of nonnegative matrix factorization, Regularization, Optimization, Kernels, and Support Vector Machines, Machine Learning and Pattern Recognition chap 12, 257-291 (2014), Boca Raton, Florida: Chapman & Hall/CRC, Boca Raton, Florida
[19] Gillis, N., Introduction to nonnegative matrix factorization, SIAG/OPT Views News, 25, 1, 7-16 (2017)
[20] Gillis, N., Hien, L.T.K., Leplat, V., Tan, V.Y.: Distributionally robust and multi-objective nonnegative matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. (2021). doi:10.1109/TPAMI.2021.3058693
[21] Gonzalez, E.F., Zhang, Y.: Accelerating the Lee-Seung algorithm for nonnegative matrix factorization. Technical report, Rice University (2005). https://scholarship.rice.edu/handle/1911/102034
[22] Guillamet, D.; Vitrià, J.; Escrig, MT; Toledo, F.; Golobardes, E., Non-negative matrix factorization for face recognition, Topics in Artificial Intelligence, 336-344 (2002), Berlin: Springer, Berlin · Zbl 1028.68642
[23] Hasinoff, S.W.: Photon, Poisson Noise (2014). http://people.csail.mit.edu/hasinoff/pubs/hasinoff-photon-2011-preprint.pdf
[24] Hien, L.T.K., Gillis, N., Patrinos, P.: Inertial block proximal method for non-convex non-smooth optimization. In: Thirty-seventh International Conference on Machine Learning (ICML 2020) (2020)
[25] Ho, N.D., Dooren, P.V.: Non-negative matrix factorization with fixed row and column sums. Linear Algebra and its Applications 429(5), 1020-1025 (2008). Special Issue devoted to selected papers presented at the 13th Conference of the International Linear Algebra Society · Zbl 1148.15010
[26] Hong, M.; Razaviyayn, M.; Luo, ZQ; Pang, JS, A unified algorithmic framework for block-structured optimization involving big data: with applications in machine learning and signal processing, IEEE Signal Process. Mag., 33, 1, 57-77 (2015)
[27] Hoyer, PO, Non-negative matrix factorization with sparseness constraints, J. Mach. Learn. Res., 5, 1457-1469 (2004) · Zbl 1222.68218
[28] Hsieh, C.J., Dhillon, I.S.: Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1064-1072 (2011)
[29] Huang, K.; Sidiropoulos, ND; Liavas, AP, A flexible and efficient algorithmic framework for constrained matrix and tensor factorization, IEEE Trans. Signal Process., 64, 19, 5052-5065 (2016) · Zbl 1414.94253
[30] Kim, C., Kim, Y., Klabjan, D.: Scale invariant power iteration (2019). arXiv:1905.09882
[31] Lee, DD; Seung, HS, Learning the parts of objects by nonnegative matrix factorization, Nature, 401, 788-791 (1999) · Zbl 1369.68285
[32] Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556-562 (2001)
[33] Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 2nd edn. Cambridge University Press (2014). http://mmds.org
[34] Li, L., Lebanon, G., Park, H.: Fast Bregman divergence NMF using Taylor expansion and coordinate descent. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-12), pp. 307-315. Association for Computing Machinery, New York, NY, USA (2012)
[35] Lin, CJ, On the convergence of multiplicative update algorithms for nonnegative matrix factorization, IEEE Trans. Neural Netw., 18, 6, 1589-1596 (2007)
[36] Lin, X.; Boutros, PC, Optimization and expansion of non-negative matrix factorization, BMC Bioinform., 21, 1, 1-10 (2020)
[37] Lu, H.; Freund, RM; Nesterov, Y., Relatively smooth convex optimization by first-order methods, and applications, SIAM J. Optim., 28, 333-354 (2018) · Zbl 1392.90090
[38] Lucy, LB, An iterative technique for the rectification of observed distributions, Astron. J., 79, 745 (1974)
[39] Ma, W.; Bioucas-Dias, JM; Chan, T.; Gillis, N.; Gader, P.; Plaza, AJ; Ambikapathi, A.; Chi, C., A signal processing perspective on hyperspectral unmixing: insights from remote sensing, IEEE Signal Process. Mag., 31, 1, 67-81 (2014)
[40] Nesterov, Y.: Lectures on Convex Optimization. Springer (2018) · Zbl 1427.90003
[41] Razaviyayn, M.; Hong, M.; Luo, Z., A unified convergence analysis of block successive minimization methods for nonsmooth optimization, SIAM J. Optim., 23, 2, 1126-1153 (2013) · Zbl 1273.90123
[42] Richardson, WH, Bayesian-based iterative method of image restoration, JoSA, 62, 1, 55-59 (1972)
[43] Shahnaz, F., Berry, M.W., Pauca, V., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42(2), 373-386 (2006). http://www.sciencedirect.com/science/article/pii/S0306457304001542 · Zbl 1087.68104
[44] Sun, D.L., Févotte, C.: Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6201-6205 (2014)
[45] Takahashi, N.; Hibi, R., Global convergence of modified multiplicative updates for nonnegative matrix factorization, Comput. Optim. Appl., 57, 2, 417-440 (2014) · Zbl 1305.65135
[46] Teboulle, M.; Vaisbourd, Y., Novel proximal gradient methods for nonnegative matrix factorization with sparsity constraints, SIAM J. Imaging Sci., 13, 1, 381-421 (2020) · Zbl 1442.90154
[47] Tran-Dinh, Q.; Kyrillidis, A.; Cevher, V., Composite self-concordant minimization, J. Mach. Learn. Res., 16, 12, 371-416 (2015) · Zbl 1337.68231
[48] Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. Tech. rep. (2008)
[49] Xu, Y.; Yin, W., A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion, SIAM J. Imaging Sci., 6, 3, 1758-1789 (2013) · Zbl 1280.49042
[50] Yanez, F., Bach, F.: Primal-dual algorithms for non-negative matrix factorization with the Kullback-Leibler divergence. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2257-2261 (2017)
[51] Zhong, S.; Ghosh, J., Generative model-based document clustering: a comparative study, Knowl. Inf. Syst., 8, 3, 374-384 (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.