Prototype selection for interpretable classification.

*(English)*Zbl 1234.62096Summary: Prototype methods seek a minimal subset of samples that can serve as a distillation or condensed view of a data set. As the size of modern data sets grows, being able to present a domain specialist with a short list of “representative” samples chosen from the data set is of increasing interpretative value. While much recent statistical research has been focused on producing sparse-in-the-variables methods, this paper aims at achieving sparsity in the samples. We discuss a method for selecting prototypes in the classification setting (in which the samples fall into known discrete categories). Our method of focus is derived from three basic properties that we believe a good prototype set should satisfy. This intuition is translated into a set cover optimization problem, which we solve approximately using standard approaches. While prototype selection is usually viewed as purely a means toward building an efficient classifier, in this paper we emphasize the inherent value of having a set of prototypical elements. That said, by using the nearest-neighbor rule on the set of prototypes, we can of course discuss our method as a classifier as well.

We demonstrate the interpretative value of producing prototypes on the well-known USPS ZIP code digits data set and show that as a classifier it performs reasonably well. We apply the method to a proteomics data set in which the samples are strings and therefore not naturally embedded in a vector space. Our method is compatible with any dissimilarity measure, making it amenable to situations in which using a non-Euclidean metric is desirable or even necessary.

We demonstrate the interpretative value of producing prototypes on the well-known USPS ZIP code digits data set and show that as a classifier it performs reasonably well. We apply the method to a proteomics data set in which the samples are strings and therefore not naturally embedded in a vector space. Our method is compatible with any dissimilarity measure, making it amenable to situations in which using a non-Euclidean metric is desirable or even necessary.

##### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

90C10 | Integer programming |

90C90 | Applications of mathematical programming |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

PDF
BibTeX
Cite

\textit{J. Bien} and \textit{R. Tibshirani}, Ann. Appl. Stat. 5, No. 4, 2403--2424 (2011; Zbl 1234.62096)

Full Text:
DOI

##### References:

[1] | Asuncion, A. and Newman, D. J. (2007). UCI Machine Learning Repository. Univ. California, Irvine, School of Information and Computer Sciences. |

[2] | Basu, M. and Ho, T. K. (2006). Data Complexity in Pattern Recognition . Springer, London. · Zbl 1119.68168 |

[3] | Cannon, A. H. and Cowen, L. J. (2004). Approximation algorithms for the class cover problem. Ann. Math. Artif. Intell. 40 215-223. · Zbl 1075.68609 |

[4] | Cano, J. R., Herrera, F. and Lozano, M. (2003). Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study. IEEE Transactions on Evolutionary Computation 7 561-575. |

[5] | Cano, J. R., Herrera, F. and Lozano, M. (2007). Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability. Data and Knowledge Engineering 60 90-108. |

[6] | Ceyhan, E., Priebe, C. E. and Marchette, D. J. (2007). A new family of random graphs for testing spatial segregation. Canad. J. Statist. 35 27-50. · Zbl 1124.05039 |

[7] | Cover, T. M. and Hart, P. (1967). Nearest neighbor pattern classification. Proc. IEEE Trans. Inform. Theory IT-11 21-27. · Zbl 0154.44505 |

[8] | Devijver, P. A. and Kittler, J. (1982). Pattern Recognition: A Statistical Approach . Prentice Hall, Englewood Cliffs, NJ. · Zbl 0542.68071 |

[9] | DeVinney, J. and Wierman, J. C. (2002). A SLLN for a one-dimensional class cover problem. Statist. Probab. Lett. 59 425-435. · Zbl 1018.60031 |

[10] | Fayed, H. A. and Atiya, A. F. (2009). A novel template reduction approach for the K -nearest neighbor method. IEEE Transactions on Neural Networks 20 890-896. |

[11] | Feige, U. (1998). A threshold of ln n for approximating set cover. J. ACM 45 634-652. · Zbl 1065.68573 |

[12] | Friedman, J. H., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1-22. |

[13] | Hart, P. (1968). The condensed nearest-neighbor rule. IEEE Trans. Inform. Theory 14 515-516. |

[14] | Hastie, T. and Simard, P. Y. (1998). Models and metrics for handwritten digit recognition. Statist. Sci. 13 54-65. · Zbl 0966.68186 |

[15] | Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005 |

[16] | Hussain, Z., Szedmak, S. and Shawe-Taylor, J. (2004). The linear programming set covering machine. Pattern Analysis, Statistical Modelling and Computational Learning . |

[17] | Kohonen, T. (2001). Self-Organizing Maps , 3rd ed. Springer Series in Information Sciences 30 . Springer, Berlin. · Zbl 0957.68097 |

[18] | Könemann, J., Parekh, O. and Segev, D. (2006). A unified approach to approximating partial covering problems. In Algorithms-ESA 2006. Lecture Notes in Computer Science 4168 468-479. Springer, Berlin. · Zbl 1131.90443 |

[19] | Leslie, C. S., Eskin, E., Cohen, A., Weston, J. and Noble, W. S. (2004). Mismatch string kernels for discriminative protein classification. Bioinformatics 20 467-476. |

[20] | Lozano, M., Sotoca, J. M., Sánchez, J. S., Pla, F., Pkalska, E. and Duin, R. P. W. (2006). Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces. Pattern Recognition 39 1827-1838. · Zbl 1097.68619 |

[21] | Marchand, M. and Shawe-Taylor, J. (2002). The set covering machine. J. Mach. Learn. Res. 3 723-746. · Zbl 1112.68392 |

[22] | Marchiori, E. (2010). Class conditional nearest neighbor for large margin instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 32 364-370. |

[23] | Park, M. Y. and Hastie, T. (2007). L 1 -regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 659-677. |

[24] | Priebe, C. E., DeVinney, J. G., Marchette, D. J. and Socolinsky, D. A. (2003). Classification using class cover catch digraphs. J. Classification 20 3-23. · Zbl 1055.62074 |

[25] | Ripley, B. D. (2005). Pattern Recognition and Neural Networks . Cambridge Univ. Press, New York. · Zbl 0853.62046 |

[26] | Simard, P. Y., Le Cun, Y. A. and Denker, J. S. (1993). Efficient pattern recognition using a new transformation distance. In Advances in Neural Information Processing Systems 50-58. Morgan Kaufmann, San Mateo, CA. |

[27] | Takigawa, I., Kudo, M. and Nakamura, A. (2009). Convex sets as prototypes for classifying patterns. Eng. Appl. Artif. Intell. 22 101-108. |

[28] | Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99 6567-6572. |

[29] | Tipping, M. E. and Schölkopf, B. (2001). A kernel approach for vector quantization with guaranteed distortion bounds. In Artificial Intelligence and Statistics (T. Jaakkola and T. Richardson, eds.) 129-134. Morgan Kaufmann, San Francisco. |

[30] | Vazirani, V. V. (2001). Approximation Algorithms . Springer, Berlin. · Zbl 0999.68546 |

[31] | Wilson, D. R. and Martinez, T. R. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning 38 257-286. · Zbl 0954.68126 |

[32] | Zhu, J., Rosset, S., Hastie, T. and Tibshirani, R. (2004). 1-norm support vector machines. In Advances in Neural Information Processing Systems 16 (S. Thrun, L. Saul and B. Schölkopf, eds.). MIT Press, Cambridge, MA. · Zbl 1222.68213 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.