A provably correct sampler for probabilistic programs.

*(English)*Zbl 1366.68024
Harsha, Prahladh (ed.) et al., 35th IARCS annual conference on foundations of software technology and theoretical computer science, FSTTCS 2015, Bangalore, India, December 16–18, 2015. Proceedings. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik (ISBN 978-3-939897-97-2). LIPIcs – Leibniz International Proceedings in Informatics 45, 475-488 (2015).

Summary: We consider the problem of inferring the implicit distribution specified by a probabilistic program. A popular inference technique for probabilistic programs called Markov Chain Monte Carlo or MCMC sampling involves running the program repeatedly and generating sample values by perturbing values produced in “previous runs”. This simulates a Markov chain whose stationary distribution is the distribution specified by the probabilistic program.

However, it is non-trivial to implement MCMC sampling for probabilistic programs since each variable could be updated at multiple program points. In such cases, it is unclear which values from the “previous run” should be used to generate samples for the “current run”.

We present an algorithm to solve this problem for the general case and formally prove that the algorithm is correct. Our algorithm handles variables that are updated multiple times along the same path, updated along different paths in a conditional statement, or repeatedly updated inside loops, We have implemented our algorithm in a tool called Infer\(^\checkmark\). We empirically demonstrate that Infer\(^\checkmark\) produces the correct result for various benchmarks, whereas existing tools such as R2 and Stan produce incorrect results on several of these benchmarks.

For the entire collection see [Zbl 1338.68006].

However, it is non-trivial to implement MCMC sampling for probabilistic programs since each variable could be updated at multiple program points. In such cases, it is unclear which values from the “previous run” should be used to generate samples for the “current run”.

We present an algorithm to solve this problem for the general case and formally prove that the algorithm is correct. Our algorithm handles variables that are updated multiple times along the same path, updated along different paths in a conditional statement, or repeatedly updated inside loops, We have implemented our algorithm in a tool called Infer\(^\checkmark\). We empirically demonstrate that Infer\(^\checkmark\) produces the correct result for various benchmarks, whereas existing tools such as R2 and Stan produce incorrect results on several of these benchmarks.

For the entire collection see [Zbl 1338.68006].

##### MSC:

68N30 | Mathematical aspects of software engineering (specification, verification, metrics, requirements, etc.) |

68Q87 | Probability in computer science (algorithm analysis, random structures, phase transitions, etc.) |