SHOPPER: a probabilistic model of consumer choice with substitutes and complements. (English) Zbl 1443.62218

Summary: We develop SHOPPER, a sequential probabilistic model of shopping data. SHOPPER uses interpretable components to model the forces that drive how a customer chooses products; in particular, we designed SHOPPER to capture how items interact with other items. We develop an efficient posterior inference algorithm to estimate these forces from large-scale data, and we analyze a large dataset from a major chain grocery store. We are interested in answering counterfactual queries about changes in prices. We found that SHOPPER provides accurate predictions even under price interventions, and that it helps identify complementary and substitutable pairs of products.


62L10 Sequential statistical analysis
62P20 Applications of statistics to economics


word2vec; GloVe; t-SNE
Full Text: DOI arXiv Euclid


[1] Abernethy, J., Bach, F., Evgeniou, T. and Vert, J. P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res. 10 803-826. · Zbl 1235.68122
[2] Arora, S., Li, Y., Liang, Y. and Ma, T. (2016). RAND-WALK: A latent variable model approach to word embeddings. Transact. Assoc. Comput. Linguist. 4.
[3] Athey, S. and Stern, S. (1998). An empirical framework for testing theories about complimentarity in organizational design. Technical report, National Bureau of Economic Research, Cambridge, MA.
[4] Bamler, R. and Mandt, S. (2017). Dynamic word embeddings via skip-gram filtering. In International Conference in Machine Learning.
[5] Barkan, O. (2016). Bayesian neural word embedding. Preprint. Available at arXiv:1603.06571.
[6] Barkan, O. and Koenigstein, N. (2016). Item2Vec: Neural item embedding for collaborative filtering. In IEEE International Workshop on Machine Learning for Signal Processing.
[7] Bengio, Y., Ducharme, R., Vincent, P. and Janvin, C. (2003). A neural probabilistic language model. J. Mach. Learn. Res. 3 1137-1155. · Zbl 1061.68157
[8] Bengio, Y., Schwenk, H., Senécal, J. S., Morin, F. and Gauvain, J. L. (2006). Neural probabilistic language models. In Innovations in Machine Learning Springer, Berlin.
[9] Berry, S. (2014). Structural models of complementary choices. Mark. Lett. 25 245-256.
[10] Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. J. Amer. Statist. Assoc. 112 859-877.
[11] Blum, J. R. (1954). Approximation methods which converge with probability one. Ann. Math. Stat. 25 382-386. · Zbl 0055.37806
[12] Bottou, L., Curtis, F. E. and Nocedal, J. (2018). Optimization methods for large-scale machine learning. SIAM Rev. 60 223-311. · Zbl 1397.65085
[13] Browning, M. and Meghir, C. (1991). The effects of male and female labor supply on commodity demands. Econometrica 59 925-951.
[14] Canny, J. (2004). GaP: A factor model for discrete data. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[15] Cattell, R. B. (1952). Factor Analysis: An Introduction and Manual for the Psychologist and Social Scientist. Harper, New York.
[16] Che, H., Chen, X. and Chen, Y. (2012). Investigating effects of out-of-stock on consumer stockkeeping unit choice. J. Mark. Res. 49 502-513.
[17] Chintagunta, P. K. (1994). Heterogeneous logit model implications for brand positioning. J. Mark. Res. 304-311.
[18] Chintagunta, P. K., Nair and Harikesh, S. (2011). Structural workshop paper—discrete-choice models of consumer demand in marketing. Mark. Sci. 30 977-996.
[19] Deaton, A. and Muellbauer, J. (1980). An almost ideal demand system. Am. Econ. Rev. 70 312-326.
[20] Donnelly, R., Ruiz, F. J. R., Blei, D. M. and Athey, S. (2019). Counterfactual inference for consumer choice across many product categories. Available at arXiv:1906.02635.
[21] Doshi-Velez, F., Miller, K. T., Van Gael, J. and Teh, Y. W. (2009). Variational inference for the Indian buffet process. In Proceedings of the International Conference on Artificial Intelligence and Statistics 12.
[22] Elrod, T. (1988). Choice map: Inferring a product-market map from panel data. Mark. Sci. 7 21-40.
[23] Elrod, T. and Keane, M. P. (1995). A factor-analytic probit model for representing the market structure in panel data. J. Mark. Res. 32 1-16.
[24] Firth, J. R. (1957). A synopsis of linguistic theory 1930-1955. In Studies in Linguistic Analysis (Special Volume of the Philological Society) 1952-1959.
[25] Gentzkow, M. (2007). Valuing new goods in a model with complementarity: Online newspapers. Am. Econ. Rev. 97 713-744.
[26] Gopalan, P., Hofman, J. and Blei, D. M. (2015). Scalable recommendation with hierarchical Poisson factorization. In Uncertainty in Artificial Intelligence 326-335. AUAI Press, Arlington, VA.
[27] Gopalan, P., Ruiz, F. J. R., Ranganath, R. and Blei, D. M. (2014). Bayesian nonparametric Poisson factorization for recommendation systems. In Artificial Intelligence and Statistics.
[28] Görür, D., Jäkel, F. and Rasmussen, C. E. (2006). A choice model with infinitely many latent features. In International Conference on Machine Learning.
[29] Harris, Z. S. (1954). Distributional structure. Word 10 146-162.
[30] Hoffman, M. D., Blei, D. M., Wang, C. and Paisley, J. (2013). Stochastic variational inference. J. Mach. Learn. Res. 14 1303-1347. · Zbl 1317.68163
[31] Hotz, V. J. and Miller, R. A. (1993). Conditional choice probabilities and the estimation of dynamic models. Rev. Econ. Stud. 60 497-529. · Zbl 0788.90007
[32] Hu, Y., Koren, Y. and Volinsky, C. (2008). Collaborative filtering for implicit feedback datasets. In IEEE International Conference on Data Mining.
[33] Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37 183-233. · Zbl 0945.68164
[34] Keane, M. P. et al. (2013). Panel data discrete choice models of consumer demand. Prepared for The Oxford Handbooks: Panel Data.
[35] Kingma, D. P. and Welling, M. (2014). Auto-encoding variational Bayes. In International Conference on Learning Representations.
[36] Levy, O. and Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems.
[37] Liang, D., Altosaar, J., Charlin, L. and Blei, D. M. (2015). Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. In ACM Conference on Recommender System.
[38] Ma, H., Liu, C., King, I. and Lyu, M. R. (2011). Probabilistic factor models for web site recommendation. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval.
[39] Mikolov, T., Yih, W. T. and Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[40] Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. and Dean, J. (2013a). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems.
[41] Mikolov, T., Chen, K., Corrado, G. S. and Dean, J. (2013b). Efficient estimation of word representations in vector space. In International Conference on Learning Representations.
[42] Mnih, A. and Hinton, G. E. (2007). Three new graphical models for statistical language modelling. In International Conference on Machine Learning.
[43] Mnih, A. and Kavukcuoglu, K. (2013). Learning word embeddings efficiently with noise-contrastive estimation. In Advances in Neural Information Processing Systems.
[44] Mnih, A. and Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. In International Conference on Machine Learning.
[45] Naesseth, C., Ruiz, F. J. R., Linderman, S. and Blei, D. M. (2017). Reparameterization gradients through acceptance-rejection methods. In Artificial Intelligence and Statistics.
[46] Ng, A. Y. and Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In International Conference in Machine Learning.
[47] Pennington, J., Socher, R. and Manning, C. D. (2014). GloVe: Global vectors for word representation. In Conference on Empirical Methods on Natural Language Processing.
[48] Rezende, D. J., Mohamed, S. and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning.
[49] Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Stat. 22 400-407. · Zbl 0054.05901
[50] Rudolph, M., Ruiz, F. J. R., Mandt, S. and Blei, D. M. (2016). Exponential family embeddings. In Advances in Neural Information Processing Systems.
[51] Ruiz, F. J., Athey, S. and Blei, D. M. (2020). Supplement to “SHOPPER: A probabilistic model of consumer choice with substitutes and complements.” https://doi.org/10.1214/19-AOAS1265SUPP.
[52] Ruiz, F. J. R., Titsias, M. K. and Blei, D. M. (2016). The generalized reparameterization gradient. In Advances in Neural Information Processing Systems.
[53] Ruiz, F. J. R., Titsias, M. K., Dieng, A. B. and Blei, D. M. (2018). Augment and reduce: Stochastic inference for large categorical distributions. In International Conference on Machine Learning.
[54] Russell, S. J. (1998). Learning agents for uncertain environments. In Annual Conference on Computational Learning Theory.
[55] Semenova, V., Goldman, M., Chernozhukov, V. and Taddy, M. (2018). Orthogonal ML for demand estimation: High dimensional causal inference in dynamic panels. Available at arXiv:1712.09988.
[56] Song, I. and Chintagunta, P. K. (2007). A discrete-continuous model for multicategory purchase behavior of households. J. Mark. Res. 44 595-612.
[57] Stern, D. H., Herbrich, R. and Thore, G. (2009). Matchbox: Large scale Bayesian recommendations. In 18th International World Wide Web Conference.
[58] Titsias, M. K. (2016). One-vs-each approximation to softmax for scalable estimation of probabilities. In Advances in Neural Information Processing Systems.
[59] Titsias, M. K. and Lázaro-Gredilla, M. (2014). Doubly stochastic variational Bayes for non-conjugate inference. In International Conference on Machine Learning.
[60] Train, K. E., McFadden, D. L. and Ben-Akiva, M. (1987). The demand for local telephone service: A fully discrete model of residential calling patterns and service choices. Rand J. Econ. 109-123.
[61] van der Maaten, L. J. P. and Hinton, G. E. (2008). Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9 2579-2605. · Zbl 1225.68219
[62] Vilnis, L. and McCallum, A. (2015). Word representations via Gaussian embedding. In International Conference on Learning Representations.
[63] Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1 1-305. · Zbl 1193.62107
[64] Wan, M., Wang, D., Goldman, M., Taddy, M., Rao, J., Liu, J., Lymberopoulos, D. and McAuley, J. (2017). Modeling consumer preferences and price sensitivities from large-scale grocery shopping transaction logs. In International World Wide Web Conference.
[65] Wang, C. and Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[66] Wolpin, K. I. (1984). An estimable dynamic stochastic model of fertility and child mortality. J. Polit. Econ. 92 852-874.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.