[1] | Evgeniou, T.; Pontil, M.; Poggio, T.: Regularization networks and suport vector machine, Advances in computational mathematics 10, 51-80 (1999) |

[2] | Blanchard, G.; Bousquet, O.; Massart, P.: Statistical performance of support vector machines, Annals of statistics 36, 489-531 (2008) |

[3] | Bartlett, P. L.; Jordan, M. I.; Mcauliffe, J. D.: Convexity, classification, and risk bounds, Journal of the American statistical association 101, 138-156 (2006) |

[4] | Chen, D. R.; Wu, Q.; Ying, Y.; Zhou, D. X.: Support vector machine soft margin classifiers: error analysis, Journal of machine learning research 5, 1143-1175 (2004) |

[5] | Steinwart, I.; Scovel, C.: Fast rates for suppport vector machines using Gaussian kernels, Annals of statistics 35, 575-607 (2007) |

[6] | Wu, Q.; Ying, Y.; Zhou, D. X.: Multi-kernel regularized classifiers, Journal of complexity 23, 108-134 (2007) |

[7] | Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization, Annals of statistics 32, 56-85 (2004) |

[8] | Smale, S.; Zhou, D. X.: Online learning with Markov sampling, Analysis and applications 7, 87-113 (2009) |

[9] | Xiao, Q. W.; Pan, Z. W.: Learning from non-identical sampling for classification, Advances in computational mathematics 33, 97-112 (2010) |

[10] | Yu, B.: Rates of convergence for emipircal processes of stationary mixing sequence, The annals of probability 22, 94-116 (1994) |

[11] | Bradley, R. C.: Basic properties of strong mixing conditions. A survey and some open questions, Probability surveys 2, 107-144 (2005) |

[12] | Vidyasagar, M.: Learning and generalization: with applications to neural networks, (2003) |

[13] | Modha, D. S.; Masry, E.: Minimum complexity regression estimation with weakly dependent observations, IEEE transactions on information theory 42, 2133-2145 (2002) |

[14] | Mohri, M.; Rostamizadeh, A.: Rademacher complexity bounds for non-i.i.d. Processes, Advances in neural information processing systems 21, 1097-1104 (2009) |

[15] | Mohri, M.; Rostamizadeh, A.: Stability bounds for stationary $\phi $-mixing and $\beta $-mixing processes, Journal of machine learning research 11, 789-814 (2010) |

[16] | Pan, Z. W.; Xiao, Q. W.: Least-squares regularized regression with non-i.i.d. Sampling, Journal of statistical planning and inference 139, 3579-3587 (2009) |

[17] | Steinwart, I.; Hush, D.; Scovel, C.: Learning from dependent observations, Journal of multivariate analysis 100, 175-194 (2008) |

[18] | Sun, H.; Wu, Q.: Regularized least squares regression with dependent samples, Advances in computational mathematics 32, 1019-7168 (2010) |

[19] | Xu, Y. L.; Chen, D. R.: Learning rates of regularized regression for exponentially strongly mixing sequence, Journal of statistical planning and inference 138, 2180-2189 (2008) |

[20] | Smale, S.; Zhou, D. X.: Learning theory estimates via integral operators and their approximations, Constructive approximation 26, 153-172 (2007) |

[21] | Cucker, F.; Zhou, D. X.: Learning theory: an approximation theory viewpoint, (2007) |

[22] | Zhou, D. X.: The covering number in learning theory, Journal of complexity 18, 739-767 (2002) |

[23] | Xiang, D. H.: Classification with gaussians and convex loss II: Improving error bounds by noise conditions, Science China mathematics 53, 1-7 (2010) |

[24] | Van Der Vaart, A. W.; Wellner, J. A.: Weak convergence and emprical processes, (1996) |

[25] | Zhou, D. X.: Capacity of reproducing kernel spaces in learning theory, IEEE transactions on information theory 49, 1743-1752 (2003) |

[26] | Bousquet, O.: A bennett concentration inequality and its application to surprema of emprical processes, Comptes rendus mathematique 334, 495-500 (2002) |

[27] | Mendelson, S.: Improving the sample complexity using global data, IEEE transactions on information theory 7, 1977-1991 (2002) |