×

\(\epsilon\)-optimal discretized linear reward-penalty learning automata. (English) Zbl 0657.68062

We consider variable structure stochastic automata (VSSA), which interact with an environment and which dynamically learn the optimal action that the environment offers. Like all VSSA the automata are fully defined by a set of action probability updating rules [V. I. Varshavskij and I. P. Voroncova, Autom. Remote Control 24, 327-333 (1963), translation from Avtom. Telemekh. 24, 353-360 (1963; Zbl 0122.375); K. S. Narendra and M. A. L. Thathachar, IEEE Trans. Syst. Man Cybern. SMC-4, 323-334 (1974; Zbl 0279.68067)]. However, to minimize the requirements on the random number generator used to implement the VSSA, and to increase the speed of convergence of the automaton, we consider the case in which the probability functions can assume only a finite number of values. These values discretize the probability space [0,1] and hence they are called discretized learning automata. The discretized automata are linear because the subintervals of [0,1] are of equal length.
We shall prove the following results: a) two-action discretized linear reward-penalty automata are ergodic and \(\epsilon\)-optimal in all environments whose minimum penalty probability is less than 0.5; b) there exist discretized two-action linear reward-penalty automata that are ergodic and \(\epsilon\)-optimal in all random environments; and c) discretized two-action linear reward-penalty automata with artificially created absorbing barriers are \(\epsilon\)-optimal in all random environments. Apart from the above theoretical results simulation results will be presented that indicate the properties of the automata discussed. The rate of convergence of all these automata and some open problems are also presented.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68Q45 Formal languages and automata
PDFBibTeX XMLCite
Full Text: DOI