×

Generator estimation of Markov jump processes. (English) Zbl 1128.65012

Summary: Estimating the generator of a continuous-time Markov jump process based on incomplete data is a problem which arises in various applications ranging from machine learning to molecular dynamics. Several methods have been devised for this purpose: a quadratic programming approach [cf. D. T. Crommelin and E. Vanden-Eijnden, J. Comp. Phys. 217, No. 2, 782–805 (2006; Zbl 1102.65009)], a resolvent method [cf. T. Müller, Modellierung von Proteinevolution, PhD thesis, Heidelberg (2001)], and various implementations of an expectation-maximization algorithm [cf. S. Asmussen, O. Nerman and M. Olsson, Scand. J. Stat. 23, No. 4, 419–441 (1996; Zbl 0898.62104); I. Holmes and G. M. Rubin, An expectation maximization algorithm for training hidden substitution models, J. Mol. Biol. 317, 753–764 (2002); U. Nodelman, C. R. Shelton and D. Koller, Expectation maximization and complex duration distributions for continuous time Bayesian networks, in: Proceedings of the twenty-first conference on uncertainty in AI (UAI), 421–430 (2005); M. Bladt and M. Sørensen, J. R. Stat. Soc., Ser. B, Stat. Methodol. 67, 395–410 (2005; Zbl 1069.62061)]. Some of these methods, however, seem to be known only in a particular research community, and have later been reinvented in a different context.
The purpose of this paper is to compile a catalogue of existing approaches, to compare the strengths and weaknesses, and to test their performance in a series of numerical examples. These examples include carefully chosen model problems and an application to a time series from molecular dynamics.

MSC:

65C60 Computational problems in statistics (MSC2010)
65C40 Numerical analysis or methods applied to Markov chains
60J22 Computational methods in Markov chains
60J75 Jump processes (MSC2010)
60J35 Transition functions, generators and resolvents
62M05 Markov processes: estimation; hidden Markov models
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)

Software:

EMpht; Gromacs
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bladt, M.; Sørensen, M., Statistical inference for discretely observed Markov jump processes, J.R. Statist. Soc. B, 67, 395-410 (2005) · Zbl 1069.62061
[2] Crommelin, D. T.; Vanden-Eijnden, E., Fitting timeseries by continuous-time Markov chains: a quadratic programming approach, J. Comp. Phys., 217, 782-805 (2006) · Zbl 1102.65009
[3] T. Müller, Modellierung von Proteinevolution, PhD thesis, Heidelberg, 2001.; T. Müller, Modellierung von Proteinevolution, PhD thesis, Heidelberg, 2001.
[4] A. Hobolth, J.L. Jensen, Statistical inference in evolutionary models of DNA sequences via the EM algorithm, Statistical Applications in Genetics and Molecular Biology 4 (2005) Article 18.; A. Hobolth, J.L. Jensen, Statistical inference in evolutionary models of DNA sequences via the EM algorithm, Statistical Applications in Genetics and Molecular Biology 4 (2005) Article 18. · Zbl 1081.62094
[5] Müller, T.; Spang, R.; Vingron, M., Estimating amino acid substitution models: A comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method, Mol. Biol. Evol., 19, 8-13 (2002)
[6] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, J.R. Statist. Soc., 39, 1-38 (1977) · Zbl 0364.62022
[7] Neuts, M. F., Algorithmic Probability: a Collection of Problems (1995), Chapman and Hall · Zbl 0870.60002
[8] Brass, A.; Pendleton, B. J.; Chen, Y.; Robson, B., Hybrid Monte Carlo simulations theory and initial comparison with molecular dynamics, Biopolymers, 33, 1307-1315 (1993)
[9] Berendsen, H. J.C.; van der Spoel, D.; van Drunen, R., Gromacs: a message-passing parallel molecular dynamics implementation, Comp. Phys. Comm., 91, 43-56 (1995)
[10] Lindahl, E.; Hess, B.; van der Spoel, D., Gromacs 3.0: a package for molecular simulation and trajectory analysis, J. Mol. Mod., 7 (2001)
[11] Deuflhard, P.; Huisinga, W.; Fischer, A.; Schütte, Ch., Identification of almost invariant aggregates in reversible nearly uncoupled Markov chains, Lin. Alg. Appl., 315, 39-59 (2000) · Zbl 0963.65008
[12] F. Cordes, M. Weber, J. Schmidt-Ehrenberg, Metastable conformations via successive Perron-Cluster Cluster analysis of dihedrals, ZIB-Report 02-40 (2002).; F. Cordes, M. Weber, J. Schmidt-Ehrenberg, Metastable conformations via successive Perron-Cluster Cluster analysis of dihedrals, ZIB-Report 02-40 (2002).
[13] E. Weinan, Vanden-Eijnden, Towards a theory of transition paths, J. Stat. Phys. 123 (2006) 503-523.; E. Weinan, Vanden-Eijnden, Towards a theory of transition paths, J. Stat. Phys. 123 (2006) 503-523. · Zbl 1101.82016
[14] Metzner, P.; Schütte, C.; Vanden-Eijnden, E., Illustration of transition path theory on a collection of simple examples, J. Chem. Phys., 125, 084110 (2006)
[15] P. Metzner, Ch. Schütte, E. Vanden-Eijnden, Transition path theory for Markov jump processes, submitted for publication.; P. Metzner, Ch. Schütte, E. Vanden-Eijnden, Transition path theory for Markov jump processes, submitted for publication.
[16] Israel, R. B.; Rosenthal, J. S.; Wei, J. Z., Finding generators for Markov chains via empirical transition matrices with applications to credit ranking, Mathematical Finance, 11, 245-265 (2001) · Zbl 0987.60086
[17] U. Nodelman, C.R. Shelton, D. Koller, Expectation maximization and complex duration distributions for continuous time Bayesian networks, in: Proceedings of the Twenty-first Conference on Uncertainty in AI (UAI), 2005, pp. 421-430.; U. Nodelman, C.R. Shelton, D. Koller, Expectation maximization and complex duration distributions for continuous time Bayesian networks, in: Proceedings of the Twenty-first Conference on Uncertainty in AI (UAI), 2005, pp. 421-430.
[18] Holmes, I.; Rubin, G. M., An expectation maximization algorithm for training hidden substitution models, J. Mol. Biol., 317, 753-764 (2002)
[19] Asmussen, S.; Nerman, O.; Olsson, M., Fitting phase-type distributions via the EM algorithm, Scand. J. Stat., 23, 419-441 (1996) · Zbl 0898.62104
[20] Arvestad, L.; Bruno, W. J., Estimation of reversible substitution matrices from multiple pairs of sequences, J. Mol. Evol., 45, 696-703 (1997)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.