Equations of states in singular statistical estimation.

*(English)*Zbl 1396.68106Summary: Learning machines that have hierarchical structures or hidden variables are singular statistical models because they are nonidentifiable and their Fisher information matrices are singular. In singular statistical models, neither does the Bayes a posteriori distribution converge to the normal distribution nor does the maximum likelihood estimator satisfy asymptotic normality. This is the main reason that it has been difficult to predict their generalization performance from trained states. In this paper, we study four errors, (1) the Bayes generalization error, (2) the Bayes training error, (3) the Gibbs generalization error, and (4) the Gibbs training error, and prove that there are universal mathematical relations among these errors. The formulas proved in this paper are equations of states in statistical estimation because they hold for any true distribution, any parametric model, and any a priori distribution. Also we show that the Bayes and Gibbs generalization errors can be estimated by Bayes and Gibbs training errors, and we propose widely applicable information criteria that can be applied to both regular and singular statistical models.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

Full Text:
DOI

##### References:

[1] | Amari, S.; Park, H.; Ozeki, T., Singularities affect dynamics of learning in neuromanifolds, Neural computation, 18, 5, 1007-1065, (2006) · Zbl 1092.68636 |

[2] | Aoyagi, M.; Watanabe, S., Stochastic complexities of reduced rank regression in Bayesian estimation, Neural networks, 18, 7, 924-933, (2005) · Zbl 1077.68749 |

[3] | Aoyagi, M.; Watanabe, S., Resolution of singularities and generalization error with Bayesian estimation for layered neural network, IEICE transactions, J88-D-II, 10, 2112-2124, (2005) |

[4] | Atiyah, M.F., Resolution of singularities and division of distributions, Communications in pure and applied mathematics, 13, 145-150, (1970) · Zbl 0188.19405 |

[5] | Hagiwara, K., On the problem in model selection of neural network regression in overrealizable scenario, Neural computation, 14, 8, 1979-2002, (2002) · Zbl 1010.68115 |

[6] | Hartigan, J.A., A failure of likelihood asymptotics for normal mixture, Proc. of Berkeley conf. in honor of jerzy Neyman and Jack keifer, 2, 807-810, (1985) · Zbl 1373.62070 |

[7] | Hayasaka, T.; Kitahara, M.; Usui, S., On the asymptotic distribution of the least-squares estimators in unidentifiable models, Neural computation, 16, 1, 99-114, (2004) · Zbl 1084.68098 |

[8] | Hironaka, H., Resolution of singularities of an algebraic variety over a field of characteristic zero, Annals of mathematics, 79, 109-326, (1964) · Zbl 0122.38603 |

[9] | Kashiwara, M., B-functions and holonomic systems, Inventions mathematicae, 38, 33-53, (1976) · Zbl 0354.35082 |

[10] | KollĂˇr, J., Lectures on resolution of singularities, (2007), Princeton University Press Princeton · Zbl 1113.14013 |

[11] | Nagata, K.; Watanabe, S., Exchange Monte Carlo sampling from Bayesian posterior for singular learning machines, IEEE transactions on neural networks, 19, 7, 1253-1266, (2008) |

[12] | Van der Vaart, A.W.; Wellner, J.A., Weak convergence and empirical processes, (1996), Springer · Zbl 0862.60002 |

[13] | Watanabe, S. (1995). Generalized Bayesian framework for neural networks with singular Fisher information matrices. In Proc. of international symposium on nonlinear theory and its applications (pp. 207-210) |

[14] | Watanabe, S., Algebraic analysis for singular statistical estimation, (), 39-50 · Zbl 0949.68126 |

[15] | Watanabe, S., Algebraic information geometry for learning machines with singularities, Advances in neural information processing systems, 329-336, (2000) |

[16] | Watanabe, S., Algebraic analysis for nonidentifiable learning machines, Neural computation, 13, 4, 899-933, (2001) · Zbl 0985.68051 |

[17] | Watanabe, S., Algebraic geometrical methods for hierarchical learning machines, Neural networks, 14, 8, 1049-1060, (2001) |

[18] | Watanabe, S., Learning efficiency of redundant neural networks in Bayesian estimation, IEEE transactions on neural networks, 12, 6, 1475-1486, (2001) |

[19] | Watanabe, S.; Amari, S., Learning coefficients of layered models when the true distribution mismatches the singularities, Neural computation, 15, 5, 1013-1033, (2003) · Zbl 1085.68138 |

[20] | Watanabe, S., Algebraic geometry of singular learning machines and symmetry of generalization and training errors, Neurocomputing, 67, 198-213, (2005) |

[21] | Watanabe, S., Algebraic geometry and learning theory, (2006), Morikita publishing, in Japanese |

[22] | Watanabe, S. (2007). Generalization and training errors in Bayes and Gibbs estimations in singular learning machines. IEICE Technical Report, NC2007-75 (pp. 25-30) |

[23] | Watanabe, S. (2009). On a relation between a limit theorem in learning theory and singular fluctuation, IEICE Technical Report, NC2008-111 (pp. 45-50) |

[24] | Yamazaki, K.; Watanabe, S., Singularities in mixture models and upper bounds of stochastic complexity, Neural networks, 16, 7, 1029-1038, (2003) · Zbl 1255.68130 |

[25] | Yamazaki, K.; Watanabe, S., Singularities in complete bipartite graph-type Boltzmann machines and upper bounds of stochastic complexities, IEEE transactions on neural networks, 16, 2, 312-324, (2005) |

[26] | Yamazaki, K.; Watanabe, S., Algebraic geometry and stochastic complexity of hidden Markov models, Neurocomputing, 69, 62-84, (2005) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.