×

Hierarchical testing designs for pattern recognition. (English) Zbl 1072.62052

Summary: We explore the theoretical foundations of a “twenty questions” approach to pattern recognition. The object of the analysis is the computational process itself rather than probability distributions (Bayesian inference) or decision boundaries (statistical learning). Our formulation is motivated by applications to scene interpretation in which there are a great many possible explanations for the data, one (“background”) is statistically dominant, and it is imperative to restrict intensive computation to genuinely ambiguous regions.
The focus here is then on pattern filtering: Given a large set \({\mathcal Y}\) of possible patterns or explanations, narrow down the true one \(Y\) to a small (random) subset \(\widehat Y\subset{\mathcal Y}\) of “detected” patterns to be subjected to further, more intense, processing. To this end, we consider a family of hypothesis tests for \(Y\in A\) versus the nonspecific alternatives \(Y\in A^c\). Each test has null type I error and the candidate sets \(A\subset {\mathcal Y}\) are arranged in a hierarchy of nested partitions. These tests are then characterized by scope \((|A|)\), power (or type II error) and algorithmic cost.
We consider sequential testing strategies in which decisions are made iteratively, based on past outcomes, about which test to perform next and when to stop testing. The set \(\widehat Y\) is then taken to be the set of patterns that have not been ruled out by the tests performed. The total cost of a strategy is the sum of the “testing cost” and the “postprocessing cost” (proportional to \(|\widehat Y|)\) and the corresponding optimization problem is analyzed. As might be expected, under mild assumptions good designs for sequential testing strategies exhibit a steady progression from broad scope coupled with low power to high power coupled with dedication to specific explanations. In the assumptions ensuring this property a key role is played by the ratio cost/power. These ideas are illustrated in the context of detecting rectangles amidst clutter.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62L10 Sequential statistical analysis
68T45 Machine vision and scene understanding
68T10 Pattern recognition, speech recognition
62H15 Hypothesis testing in multivariate analysis
90B40 Search theory

References:

[1] Amit, Y. (2002). 2 D Object Detection and Recognition . MIT Press, Cambridge, MA.
[2] Amit, Y. and Geman, D. (1999). A computational model for visual selection. Neural Computation 11 1691–1715.
[3] Amit, Y., Geman, D. and Fan, X. (2004). A coarse-to-fine strategy for multiclass shape detection. IEEE Trans. Pattern Analysis and Machine Intelligence 26 1606–1621.
[4] Bellman, R. (1961). Adaptive Control Processes : A Guided Tour . Princeton Univ. Press. · Zbl 0103.12901
[5] Blackwell, D. and Girschick, M. A. (1954). Theory of Games and Statistical Decisions . Wiley, New York. · Zbl 0056.36303
[6] Blanchard, G. and Geman, D. (2003). Hierarchical testing designs for pattern recognition. Technical Report 2003-07, Département de Mathématiques, Univ. de Paris-Sud. · Zbl 1072.62052
[7] Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042
[8] Chernoff, H. (1972). Sequential Analysis and Optimal Design . SIAM, Philadelphia. · Zbl 0265.62024
[9] Cootes, T. F. and Taylor, C. J. (1996). Locating faces using statistical feature detectors. In Proc. Second International Conference on Automatic Face and Gesture Recognition 204–209. IEEE Press, New York.
[10] DeGroot, M. H. (1970). Optimal Statistical Decisions . McGraw–Hill, New York. · Zbl 0225.62006
[11] Desimone, R., Miller, E. K., Chelazzi, L. and Lueschow, A. (1995). Multiple memory systems in visual cortex. In The Cognitive Neurosciences (M. S. Gazzaniga, ed.) 475–486. MIT Press, Cambridge, MA.
[12] Dietterich, T. (2000). The divide-and-conquer manifesto. In Proc. Eleventh International Conference on Algorithmic Learning Theory. Lecture Notes in Artificial Intelligence 1968 13–26. Springer, New York. · Zbl 1046.68546
[13] Fleuret, F. (2000). Détection hiérarchique de visages par apprentissage statistique. Ph.D. dissertation, Univ. Paris VI, Jussieu.
[14] Fleuret, F. and Geman, D. (2001). Coarse-to-fine face detection. International J. Computer Vision 41 85–107. · Zbl 1012.68713 · doi:10.1023/A:1011113216584
[15] Fritsch, J. and Finke, M. (1998). Applying divide and conquer to large scale pattern recognition tasks. Neural Networks : Tricks of the Trade. Lecture Notes in Comput. Sci. (G. B. Orr and K.-R. Müller, eds.) 1524 315–342. Springer, New York.
[16] Garey, M. R. (1972). Optimal binary identification procedures. SIAM J. Appl. Math. 23 173–186. · Zbl 0229.68037 · doi:10.1137/0123019
[17] Geman, D. and Jedynak, B. (2001). Model-based classification trees. IEEE Trans. Inform. Theory 47 1075–1082. · Zbl 0998.68137 · doi:10.1109/18.915664
[18] Geman, S., Bienenstock, E. and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation 4 1–58.
[19] Geman, S., Manbeck, K. and McClure, D. (1995). Coarse-to-fine search and rank-sum statistics in object recognition. Technical report, Div. Applied Mathematics, Brown Univ.
[20] Jung, F. (2001). Reconnaissance d’objets par focalisation et detection de changements. Ph.D. dissertation, Ecole Polytechnique, Paris.
[21] Jung, F. (2002). Detecting new buildings from time-varying aerial stereo pairs. Technical report, IGN.
[22] Osuna, E., Freund, R. and Girosit, F. (1997). Training support vector machines: An application to face detection. In Proc. Computer Vision and Pattern Recognition 130–136. IEEE Press, New York.
[23] Puterman, M. (1994). Markov Decision Processes . Wiley, New York. · Zbl 0829.90134
[24] Rowley, H. A., Baluja, S. and Kanade, T. (1998). Neural network-based face detection. IEEE Trans. Pattern Analysis and Machine Intelligence 20 23–38.
[25] Socolinsky, D. A., Neuheisel, J. D., Priebe, C. E., De Vinney, J. and Marchette, D. (2002). Fast face detection with a boosted cccd classifier. Technical report, Johns Hopkins Univ.
[26] Sung, K.-K. and Poggio, T. (1998). Example-based learning for view-based human face detection. IEEE Trans. Pattern Analysis and Machine Intelligence 20 39–51.
[27] Thorpe, S., Fize, D. and Marlot, C. (1996). Speed of processing in the human visual system. Nature 381 520–522. · Zbl 1133.92307
[28] Trouvé, A. and Yu, Y. (2002). Entropy reduction strategies on tree structured retrieval spaces. In Proc. Colloquium on Mathematics and Computer Science. II. Algorithms , Trees , Combinatorics and Probabilities (B. Chauvin, P. Flajolet, D. Gardy and A. Mokkadem, eds.) 513–525. Birkhäuser, Basel. · Zbl 1034.68032
[29] Viola, P. and Jones, M. J. (2001). Robust real-time face detection. In Proc. Eighth IEEE International Conference on Computer Vision 2 747. IEEE Press, New York.
[30] Yuille, A. L., Hallinan, P. and Cohen, D. S. (1992). Feature extraction from faces using deformable templates. International J. Computer Vision 8 99–111.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.