×

Dynamic pricing and learning with finite inventories. (English) Zbl 1329.91045

Summary: We study a dynamic pricing problem with finite inventory and parametric uncertainty on the demand distribution. Products are sold during selling seasons of finite length, and inventory that is unsold at the end of a selling season perishes. The goal of the seller is to determine a pricing strategy that maximizes the expected revenue. Inference on the unknown parameters is made by maximum-likelihood estimation.
We show that this problem satisfies an endogenous learning property, which means that the unknown parameters are learned on the fly if the chosen selling prices are sufficiently close to the optimal ones. We show that a small modification to the certainty equivalent pricing strategy – which always chooses the optimal price w.r.t. current parameter estimates – satisfies \(\text{Regret}(T) = O(\log^{2}(T))\), where \(\text{Regret}(T)\) measures the expected cumulative revenue loss w.r.t. a clairvoyant who knows the demand distribution. We complement this upper bound by showing an instance for which the regret of any pricing policy satisfies \(\Omega(\log T)\).

MSC:

91B24 Microeconomic theory (price theory and economic markets)
90B05 Inventory, storage, reservoirs
62P20 Applications of statistics to economics
PDFBibTeX XMLCite
Full Text: DOI Link

References:

[1] Altman E, Shwartz A (1991) Adaptive control of constrained Markov chains: Criteria and policies. Ann. Oper. Res. 28(1):101-134. CrossRef · Zbl 0762.90086
[2] Anderson TW, Taylor JB (1976) Some experimental results on the statistical properties of least squares estimates in control problems. Econometrica 44(6):1289-1302. CrossRef
[3] Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407-1420. Abstract, · Zbl 1233.90011
[4] Besbes O, Zeevi A (2011) On the minimax complexity of pricing in a changing environment. Oper. Res. 59(1):66-79. Abstract, · Zbl 1217.91062
[5] Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537-1550. Abstract, · Zbl 1263.90016
[6] Besbes O, Gur Y, Zeevi A (2014) Non-stationary stochastic optimization. Working paper, Columbia Business School, New York. · Zbl 1338.90280
[7] Bitran G, Caldentey R (2003) An overview of pricing models for revenue management. Manufacturing Service Oper. Management 5(3): 203-230. Abstract,
[8] Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965-980. Abstract, · Zbl 1260.91094
[9] Burnetas AN, Katehakis MN (1997) Optimal adaptive policies for Markov decision processes. Math. Oper. Res. 22(1):222-255. Abstract, · Zbl 0871.90103
[10] Chang HS, Fu MC, Hu J, Marcus SI (2005) An adaptive sampling algorithm for solving Markov decision processes. Oper. Res. 53(1): 126-139. Abstract, · Zbl 1165.90672
[11] Chen K, Hu I, Ying Z (1999) Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs. Ann. Statist. 27(4):1155-1163. CrossRef · Zbl 0957.62056
[12] den Boer AV (2013) Does adding data always improve linear regression estimates?Statist. Probab. Lett. 83(3):829-835. CrossRef · Zbl 1489.62198
[13] den Boer AV (2014) Dynamic pricing with multiple products and partially specified demand distribution. Math. Oper. Res. 39(3):863-888. Abstract, · Zbl 1308.90083
[14] den Boer AV (2015) Dynamic pricing and learning: historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1-18.
[15] den Boer AV, Zwart B (2014a) Mean square convergence rates for maximum quasi-likelihood estimators. Stochastic Systems 4:1-29. CrossRef · Zbl 1309.62051
[16] den Boer AV, Zwart B (2014b) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770-783. Abstract,
[17] Fahrmeir L, Kaufmann H (1985) Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Statist. 13(1):342-368. CrossRef · Zbl 0594.62058
[18] Gallego G, van Ryzin G (1994) Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Sci. 40(8):999-1020. Abstract, · Zbl 0816.90054
[19] Gill RD, Levit BY (1995) Applications of the van Trees inequality: A Bayesian Cramér-Rao bound. Bernoulli 1(1/2):59-79. CrossRef · Zbl 0830.62035
[20] Gordienko EI, Minjárez-Sosa JA (1998) Adaptive control for discrete-time Markov processes with unbounded costs: Average criterion. Math. Methods Oper. Res. 48(1):37-55. CrossRef · Zbl 0952.90043
[21] Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570-586. Abstract,
[22] Hernández-Lerma O (1989) Adaptive Markov Control Processes (Springer, New York). CrossRef
[23] Hernández-Lerma O, Cavazos-Cadena R (1990) Density estimation and adaptive control of Markov processes: Average and discounted criteria. Acta Applicandae Mathematicae 20(3):285-307. CrossRef · Zbl 0717.93066
[24] Hu I (1996) Strong consistency of Bayes estimates in stochastic regression models. J. Multivariate Anal. 57(2):215-227. CrossRef · Zbl 0845.62022
[25] Hu I (1998) Strong consistency of Bayes estimates in nonlinear stochastic regression models. J. Statist. Planning and Inference 67(1):155-163. CrossRef · Zbl 0944.62080
[26] Huh WT, Rusmevichientong P (2014) Online sequential optimization with biased gradients: Theory and applications to censored demand. INFORMS J. Comput. 26(1):150-159. Abstract, · Zbl 1356.90106
[27] Keskin NB, Zeevi A (2013) Chasing demand: Learning and earning in a changing environment. Working paper, University of Chicago, Booth School of Business, Chicago. · Zbl 1364.93886
[28] Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown linear demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142-1167. Abstract, · Zbl 1368.91103
[29] Kleinberg R, Leighton T (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th IEEE Sympos. Foundations Comput. Sci. (IEEE Computer Society, Washington, DC), 594-605. CrossRef
[30] Kumar PR (1985) A survey of some results in stochastic adaptive control. SIAM J. Control Optim. 23(3):329-380. CrossRef · Zbl 0571.93038
[31] Kumar PR, Varaiya P (1986) Stochastic Systems: Estimation, Identification and Adaptive Control (Prentice-Hall, Englewood Cliffs, NJ). · Zbl 0706.93057
[32] Kunnumkal S, Topaloglu H (2008) Using stochastic approximation methods to compute optimal base-stock levels in inventory control problems. Oper. Res. 56(3):646-664. Abstract, · Zbl 1167.90337
[33] Lai TL, Robbins H (1982) Iterated least squares in multiperiod control. Adv. Appl. Math. 3(1):50-73. CrossRef · Zbl 0489.62073
[34] Lai TL, Wei CZ (1982) Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Statist. 10(1):154-166. CrossRef · Zbl 0649.62060
[35] Lovejoy WS (1991) A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28(1):47-65. CrossRef · Zbl 0717.90086
[36] Monahan GE (1982) A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Sci. 28(1): 1-16. Abstract, · Zbl 0486.90084
[37] Nassiri-Toussi K, Ren W (1994) On the convergence of least squares estimates in white noise. IEEE Trans. Automatic Control 39(2): 364-368. CrossRef · Zbl 0825.93985
[38] Pronzato L (2009) Asymptotic properties of nonlinear estimates in stochastic models with finite design space. Statistics Probab. Lett. 79(21):2307-2313. CrossRef · Zbl 1176.62020
[39] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st ed. (John Wiley & Sons, New York). CrossRef
[40] Sauré D, Zeevi A (2013) Optimal dynamic assortment planning with demand learning. Manufacturing Service Oper. Management 15(3):387-404. Abstract,
[41] Skouras K (2000) Strong consistency in nonlinear stochastic regression models. Ann. Statist. 28(3):871-879. CrossRef · Zbl 1105.62355
[42] Talluri KT, van Ryzin GJ (2004) The Theory and Practice of Revenue Management (Kluwer Academic Publishers, Boston). CrossRef
[43] Wang Z, Deng S, Ye Y (2014) Close the gaps: A learning-while-doing algorithm for a class of single-product revenue management problems. Oper. Res. 62(2):318-331. Abstract, · Zbl 1302.91100
[44] Weatherford LR, Kimes SE (2003) A comparison of forecasting methods for hotel revenue management. Internat. J. Forecasting 19(3): 401-415. CrossRef
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.