## Adaptive policies for stochastic systems under a randomized discounted cost criterion.(English)Zbl 1201.93130

Summary: The paper deals with a class of discrete-time stochastic control processes under a discounted optimality criterion with random discount rate, and possibly unbounded costs. The state process $$\{x_t\}$$ and the discount process $$\{\alpha_t\}$$ evolve according to the coupled difference equations $$x_{t+1}= F(x_t,\alpha_t,a_t,\xi_t)$$, $$\alpha_{t+1}= G(\alpha_t,\eta_t)$$ where the state and discount disturbance processes $$\|\xi_t\|$$ and $$\{\eta_t\}$$ are sequences of i.i.d. random variables with unknown distributions $$\theta^\xi$$ and $$\theta^\eta$$ respectively. Assuming observability of the process $$\{(\xi_t,\eta_t)\}$$, we use the empirical estimator of its distribution to construct an asymptotically discounted optimal policy.

### MSC:

 93E20 Optimal stochastic control 93E10 Estimation and detection in stochastic control theory 90C40 Markov and semi-Markov decision processes 93C55 Discrete-time control/observation systems