Online learning with Markov sampling.(English)Zbl 1170.68022

Summary: This paper attempts to give an extension of learning theory to a setting where the assumption of i.i.d. data is weakened by keeping the independence but abandoning the identical restriction. We hypothesize that a sequence of examples $$(x_t, y_t)$$ in $$X\times Y$$ for $$t = 1, 2, 3,\dots$$ is drawn from a probability distribution $$\rho_t$$ on $$X\times Y$$.
The marginal probabilities on $$X$$ are supposed to converge to a limit probability on $$X$$. Two main examples for this time process are discussed. The first is a stochastic one which in the special case of a finite space $$X$$ is defined by a stochastic matrix and more generally by a stochastic kernel. The second is determined by an underlying discrete dynamical system on the space $$X$$. Our theoretical treatment requires that this dynamics be hyperbolic (or “Axiom A”) which still permits a class of chaotic systems (with Sinai-Ruelle-Bowen attractors). Even in the case of a limit Dirac point probability, one needs the measure theory to be defined using Hölder spaces.
Many implications of our work remain unexplored. These include, for example, the relation to Hidden Markov Models, as well as Markov Chain Monte Carlo methods. It seems reasonable that further work should consider the push forward of the process from $$X\times Y$$ by some kind of observable function to a data space.

MSC:

 68Q32 Computational learning theory 37D15 Morse-Smale systems 41A25 Rate of convergence, degree of approximation 60B11 Probability theory on linear topological spaces 60J10 Markov chains (discrete-time Markov processes on discrete state spaces)
Full Text:

References:

 [1] DOI: 10.1090/S0002-9947-1950-0051437-7 [2] DOI: 10.1109/72.501719 [3] DOI: 10.1090/S0273-0979-01-00923-5 · Zbl 0983.68162 [4] DOI: 10.1017/CBO9780511618796 · Zbl 1274.41001 [5] DOI: 10.1007/s10208-004-0134-1 · Zbl 1083.68106 [6] Devroye L., A Probabilistic Theory of Pattern Recognition (1997) [7] DOI: 10.1023/A:1018946025316 · Zbl 0939.68098 [8] Feller W., An Introduction to Probability Theory and Its Applications (1971) · Zbl 0219.60003 [9] DOI: 10.1109/TSP.2004.830991 · Zbl 1369.68281 [10] DOI: 10.1007/978-94-010-0612-5 [11] Lax P. D., Functional Analysis (2002) · Zbl 1009.47001 [12] DOI: 10.1162/neco.1996.8.4.819 · Zbl 05477968 [13] DOI: 10.1214/aop/1176988477 · Zbl 0836.60015 [14] DOI: 10.1109/5.18626 [15] Robinson C., Dynamical Systems: Stability, Symbolic Dynamics, and Chaos (1999) · Zbl 0914.58021 [16] DOI: 10.1007/978-3-642-65970-6 [17] DOI: 10.1090/S0002-9904-1967-11798-1 · Zbl 0202.55202 [18] DOI: 10.1007/s10208-004-0160-z · Zbl 1119.68098 [19] DOI: 10.1142/S0219530503000089 · Zbl 1079.68089 [20] DOI: 10.1090/S0273-0979-04-01025-0 · Zbl 1107.94007 [21] DOI: 10.1016/j.acha.2005.03.001 · Zbl 1107.94008 [22] DOI: 10.1007/s00365-006-0659-y · Zbl 1127.68088 [23] Sun H. W., Adv. Comput. Math. [24] S. Vempala, Combinatorial and Computational Geometry, Math. Sci. Res. Inst. Publ. 52, eds. J. E. Goodman, J. Pach and E. Welzl (Cambridge Univ. Press, Cambridge, 2005) pp. 577–616. [25] Yao Y., IEEE Trans. Inform. Theory [26] DOI: 10.1016/j.acha.2006.12.001 · Zbl 1124.68099 [27] DOI: 10.1109/TIT.2006.883632 · Zbl 1323.68450 [28] DOI: 10.1023/A:1019762724717 · Zbl 1124.37307 [29] DOI: 10.1109/TIT.2003.813564 · Zbl 1290.62033
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.