Probabilistic reasoning in intelligent systems: networks of plausible inference.

*(English)*Zbl 0746.68089
Morgan Kaufmann Series in Representation and Reasoning. San Mateo etc.: Morgan Kaufmann Publishers. XIV, 552 p. (1989).

This book is a survey of probabilistic methods in representing uncertainty, with an indepth explanation of how probabilistic methods are related to other formalisms. It starts with an overview of different formalisms that handle uncertainty, and provides strong arguments in favor of using probabilities. Mathematicians are usually accustomed to understand probabilities either as a purely mathematical notion (a positive measure \(p\) with \(p(X)=1\) for the whole space \(X\), in the style of Kolmogorov’s 1933 axiomatics), or, in statistical applications, as the limit value of frequency. When we talk about the probability \(p(E)\) that a certain statement is true, then usually, we have no sequence of events, we have only one event \(E\), and in reality, it must be either true or false. So, we cannot apply the statistical notion of probability to this unique event. Hopefully, in mathematical statistics, there exists a formalism that allows us to talk about the probabilities of hypotheses. This formalism is called Bayesian by the name of a 18 century statistician who invented the method of updating our belief in a hypothesis when new evidence is found. Such methods still form the essence of our treating such “subjective” probabilities.

Our knowledge usually consists of facts (i.e., atomic statements in some theory) and rules of the type “if \(A_ 1,\dots,\text{ and } A_ n\), then \(B\)”. In many cases, we are not 100% sure that these facts and rules are true. Therefore, we must somehow express the degree of our uncertainty. Since the author chooses to use probabilities, he expresses the degree of belief in a fact \(F\) by its (subjective) probability \(p(F)\), and the degree of belief in a rule by a conditional probability \(p(B\mid A_ 1\&\dots\&A_ n)\). If general, a knowledge is a set of facts related by rules. Such a structure is called a network. The main part of the book is devoted to analyzing such networks.

Traditional Bayesian formulas describe the simplest case when we have a single rule. To deal with more general networks, one must generalize Bayesian methods to networks (including networks with loops), be able to compute the probabilities of different queries, and update all the probabilities in the network when additional knowledge becomes known. The author proposes lots of analytical methods for that. For the case when analytical methods do not work, he proposes stochastic simulation methods.

Chapter 6 is devoted to decision making. Decision making under uncertainty is a topic that has been thoroughly developed in game theory under the name of utility theory. Utility theory is based on probabilistic representation of uncertainty, therefore, it easily fits into the author’s general picture.

Chapters 9 and 10 form a part of this book that will be most interesting to those who are knowledgeable in other approaches in representing uncertainty: it describes the relationship between probabilities and other formalisms. For Dempster-Shafer theory, that is itself expressed in terms that are close to probabilities, such a relationship does not come as a surprise. What is more surprising is that several so called “non- monotonic” formalisms that describe statements like “typically”, “normally”, can also be interpreted in probabilistic terms. Crudely speaking, a statement like “Normally, birds fly” is interpreted as a statement \(P(F(x)\mid B(x))\geq 1-\varepsilon\) about the conditional probability that a object \(x\) flies \((F(x))\) if it is a bird \((B(x))\). Then, we tend \(\varepsilon\) to 0. It turns out that a query \(Q\) is deducible in a non-monotonic logic if and only if the probability of \(Q\) (in the resulting network) tends to 1 as \(\varepsilon\to0\). We said “crudely speaking”, because for some formalisms, one has to restrict oneself only to probabilistic distributions for which the entropy attains its maximum.

Our knowledge usually consists of facts (i.e., atomic statements in some theory) and rules of the type “if \(A_ 1,\dots,\text{ and } A_ n\), then \(B\)”. In many cases, we are not 100% sure that these facts and rules are true. Therefore, we must somehow express the degree of our uncertainty. Since the author chooses to use probabilities, he expresses the degree of belief in a fact \(F\) by its (subjective) probability \(p(F)\), and the degree of belief in a rule by a conditional probability \(p(B\mid A_ 1\&\dots\&A_ n)\). If general, a knowledge is a set of facts related by rules. Such a structure is called a network. The main part of the book is devoted to analyzing such networks.

Traditional Bayesian formulas describe the simplest case when we have a single rule. To deal with more general networks, one must generalize Bayesian methods to networks (including networks with loops), be able to compute the probabilities of different queries, and update all the probabilities in the network when additional knowledge becomes known. The author proposes lots of analytical methods for that. For the case when analytical methods do not work, he proposes stochastic simulation methods.

Chapter 6 is devoted to decision making. Decision making under uncertainty is a topic that has been thoroughly developed in game theory under the name of utility theory. Utility theory is based on probabilistic representation of uncertainty, therefore, it easily fits into the author’s general picture.

Chapters 9 and 10 form a part of this book that will be most interesting to those who are knowledgeable in other approaches in representing uncertainty: it describes the relationship between probabilities and other formalisms. For Dempster-Shafer theory, that is itself expressed in terms that are close to probabilities, such a relationship does not come as a surprise. What is more surprising is that several so called “non- monotonic” formalisms that describe statements like “typically”, “normally”, can also be interpreted in probabilistic terms. Crudely speaking, a statement like “Normally, birds fly” is interpreted as a statement \(P(F(x)\mid B(x))\geq 1-\varepsilon\) about the conditional probability that a object \(x\) flies \((F(x))\) if it is a bird \((B(x))\). Then, we tend \(\varepsilon\) to 0. It turns out that a query \(Q\) is deducible in a non-monotonic logic if and only if the probability of \(Q\) (in the resulting network) tends to 1 as \(\varepsilon\to0\). We said “crudely speaking”, because for some formalisms, one has to restrict oneself only to probabilistic distributions for which the entropy attains its maximum.

Reviewer: V.Ya.Kreinovich (El Paso)