Adaptive policies for stochastic systems under a randomized discounted cost criterion. (English) Zbl 1201.93130

Summary: The paper deals with a class of discrete-time stochastic control processes under a discounted optimality criterion with random discount rate, and possibly unbounded costs. The state process \(\{x_t\}\) and the discount process \(\{\alpha_t\}\) evolve according to the coupled difference equations \(x_{t+1}= F(x_t,\alpha_t,a_t,\xi_t)\), \(\alpha_{t+1}= G(\alpha_t,\eta_t)\) where the state and discount disturbance processes \(\|\xi_t\|\) and \(\{\eta_t\}\) are sequences of i.i.d. random variables with unknown distributions \(\theta^\xi\) and \(\theta^\eta\) respectively. Assuming observability of the process \(\{(\xi_t,\eta_t)\}\), we use the empirical estimator of its distribution to construct an asymptotically discounted optimal policy.


93E20 Optimal stochastic control
93E10 Estimation and detection in stochastic control theory
90C40 Markov and semi-Markov decision processes
93C55 Discrete-time control/observation systems