##
**Node harvest.**
*(English)*
Zbl 1220.62084

Summary: When choosing a suitable technique for regression and classification with multivariate predictor variables, one is often faced with a tradeoff between interpretability and high predictive accuracy. To give a classical example, classification and regression trees are easy to understand and interpret. Tree ensembles like random forests provide usually more accurate predictions. Yet tree ensembles are also more difficult to analyze than single trees and are often criticized, perhaps unfairly, as ‘black box’ predictors.

Node harvest is trying to reconcile the two aims of interpretability and predictive accuracy by combining positive aspects of trees and tree ensembles. Results are very sparse and interpretable and predictive accuracy is extremely competitive, especially for low signal-to-noise data. The procedure is simple: an initial set of a few thousand nodes is generated randomly. If a new observation falls into just a single node, its prediction is the mean response of all training observation within this node, identical to a tree-like prediction. A new observation falls typically into several nodes and its prediction is then the weighted average of the mean responses across all these nodes. The only role of node harvest is to ‘pick’ the right nodes from the initial large ensemble of nodes by choosing node weights, which amounts in the proposed algorithm to a quadratic programming problem with linear inequality constraints. The solution is sparse in the sense that only very few nodes are selected with a nonzero weight. This sparsity is not explicitly enforced. Maybe surprisingly, it is not necessary to select a tuning parameter for optimal predictive accuracy. Node harvest can handle mixed data and missing values and is shown to be simple to interpret and competitive in predictive accuracy on a variety of data sets.

Node harvest is trying to reconcile the two aims of interpretability and predictive accuracy by combining positive aspects of trees and tree ensembles. Results are very sparse and interpretable and predictive accuracy is extremely competitive, especially for low signal-to-noise data. The procedure is simple: an initial set of a few thousand nodes is generated randomly. If a new observation falls into just a single node, its prediction is the mean response of all training observation within this node, identical to a tree-like prediction. A new observation falls typically into several nodes and its prediction is then the weighted average of the mean responses across all these nodes. The only role of node harvest is to ‘pick’ the right nodes from the initial large ensemble of nodes by choosing node weights, which amounts in the proposed algorithm to a quadratic programming problem with linear inequality constraints. The solution is sparse in the sense that only very few nodes are selected with a nonzero weight. This sparsity is not explicitly enforced. Maybe surprisingly, it is not necessary to select a tuning parameter for optimal predictive accuracy. Node harvest can handle mixed data and missing values and is shown to be simple to interpret and competitive in predictive accuracy on a variety of data sets.

### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

90C20 | Quadratic programming |

65C60 | Computational problems in statistics (MSC2010) |

PDF
BibTeX
XML
Cite

\textit{N. Meinshausen}, Ann. Appl. Stat. 4, No. 4, 2049--2072 (2010; Zbl 1220.62084)

### References:

[1] | Allen, M. (1999). Do-it-yourself climate prediction. Nature 401 642-642. |

[2] | Amit, Y. and Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Comput. 9 1545-1588. |

[3] | Asuncion, A. and Newman, D. (2007). UCI Machine Learning Repository . Univ. California, Irvine, CA. |

[4] | Bartlett, P., Jordan, M. and McAuliffe, J. (2003). Convexity, classification, and risk bounds. Technical report, Dept. Statistics, UC Berkeley. · Zbl 1118.62330 |

[5] | Blanchard, G., Schäfer, C., Rozenholc, Y. and Müller, K. (2007). Optimal dyadic decision trees. Mach. Learn. 66 209-241. |

[6] | Breiman, L. (1996a). Bagging predictors. Mach. Learn. 24 123-140. · Zbl 0858.68080 |

[7] | Breiman, L. (1996b). Stacked regressions. Mach. Learn. 24 49-64. · Zbl 0849.68104 |

[8] | Breiman, L. (2001). Random forests. Mach. Learn. 45 5-32. · Zbl 1007.68152 |

[9] | Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042 |

[10] | Chen, S., Donoho, S. and Saunders, M. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129-159. · Zbl 0979.94010 |

[11] | Conlon, E., Liu, X., Lieb, J. and Liu, J. (2003). Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl. Acad. Sci. 100 3339-3344. |

[12] | Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407-451. · Zbl 1091.62054 |

[13] | Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference 148-156. Morgan Kauffman, San Francisko, CA. |

[14] | Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Ann. Statist. 28 337-407. · Zbl 1106.62323 |

[15] | Friedman, J. and Popescu, B. (2008). Predictive learning via rule ensembles. Ann. Appl. Statist. 2 916-954. · Zbl 1149.62051 |

[16] | Goldfarb, D. and Idnani, A. (1983). A numerically stable dual method for solving strictly convex quadratic programs. Math. Program. 27 1-33. · Zbl 0537.90081 |

[17] | Hastie, T., Friedman, J. and Tibshirani, R. (2001). The Elements of Statistical Learning . Springer, New York. · Zbl 0973.62007 |

[18] | Hastie, T., Tibshirani, R., Botstein, D. and Brown, P. (2001). Supervised harvesting of expression trees. Genome Biol. 2 0003-1. |

[19] | Ishwaran, H., Kogalur, U., Blackstone, E. and Lauer, M. (2006). Random survival forests. Ann. Appl. Statist. 2 841-860. · Zbl 1149.62331 |

[20] | Johns, T., Gregory, J., Ingram, W., Johnson, C., Jones, A., Lowe, J., Mitchell, J., Roberts, D., Sexton, D., Stevenson, D. et al. (2003). Anthropogenic climate change for 1860 to 2100 simulated with the HadCM3 model under updated emissions scenarios. Climate Dynamics 20 583-612. |

[21] | Liaw, A. and Wiener, M. (2002). Classification and regression by random forest. R News 2 18-22. |

[22] | Lin, Y. and Jeon, Y. (2006). Random forests and adaptive nearest neighbors. J. Amer. Statist. Assoc. 101 578-590. · Zbl 1119.62304 |

[23] | Mangasarian, O., Street, W. and Wolberg, W. (1995). Breast cancer diagnosis and prognosis via linear programming. Oper. Res. 43 570-577. · Zbl 0857.90073 |

[24] | Meinshausen, N. (2009). Forest Garrote. Electron. J. Statist. 3 1288-1304. · Zbl 1326.62093 |

[25] | Nash, W., Sellers, T., Talbot, S., Cawthorn, A. and Ford, W. (1994). The population biology of abalone in Tasmania. Technical report, Sea Fisheries Division. |

[26] | Oakley, J. and O’Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A Bayesian approach. J. Roy. Statist. Soc. Ser. B 66 751-769. · Zbl 1046.62027 |

[27] | R Development Core Team (2005). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. |

[28] | Strobl, C., Boulesteix, A., Zeileis, A. and Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8 25. |

[29] | Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538 |

[30] | Wolpert, D. (1992). Stacked generalization. Neural Networks 5 241-259. |

[31] | Yu, B. and Bühlmann, P. (2003). Boosting with the L2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324-339. · Zbl 1041.62029 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.