Sparse projection oblique randomer forests.

*(English)*Zbl 07255135Summary: Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency. We introduce yet another decision forest, called “Sparse Projection Oblique Randomer Forests” (SPORF). SPORF trees recursively split along very sparse random projections. Our method significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with \(>100\) problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forest methods, while mitigating computational efficiency and scalability and maintaining interpretability. Very sparse random projections can be incorporated into gradient boosted trees to obtain potentially similar gains.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

##### Keywords:

ensemble learning; random forests; decision trees; random projections; classification; regression; feature extraction; sparse learning
PDF
BibTeX
XML
Cite

\textit{T. M. Tomita} et al., J. Mach. Learn. Res. 21, Paper No. 104, 39 p. (2020; Zbl 07255135)

Full Text:
Link

##### References:

[1] | Dimitris Achlioptas. Database-friendly random projections: Johnson-lindenstrauss with binary coins.Journal of Computer and System Sciences, 66(4):671-687, 2003. |

[2] | G´erard Biau, Luc Devroye, and G´abor Lugosi. Consistency of random forests and other averaging classifiers.The Journal of Machine Learning Research, 9:2015-2033, 2008. |

[3] | Ella Bingham and Heikki Mannila. Random projection in dimensionality reduction: applications to image and text data. InInternational Conference on Knowledge Discovery and Data Mining, pages 245-250, 2001. |

[4] | Rico Blaser and Piotr Fryzlewicz. Random rotation ensembles.Journal of Machine Learning Research, 17(4):1-26, 2016. |

[5] | Leo Breiman. Arcing classifiers.The Annals of Statistics, 26(3):801-849, 1998. |

[6] | Leo Breiman. Random forests.Machine Learning, 4(1):5-32, October 2001. |

[7] | Leo Breiman and Adele Cutler.Random forests, 2002. URLhttps://www.stat.berkeley. edu/ breiman/RandomForests/cc_home.htm. |

[8] | James Browne, Tyler Tomita, and Joshua Vogelstein.rerf: Randomer forest, 2018. URL https://cran.r-project.org/web/packages/rerf/. |

[9] | James Browne, Disa Mhembere, Tyler Tomita, Joshua T. Vogelstein, and Randal Burns. Forest packing: fast parallel, decision forests. InSIAM International Conference on Data Mining, pages 46-54. Society for Industrial and Applied Mathematics, 2019. |

[10] | Rich Caruana and Alexandru Niculescu-Mizil. An empirical comparison of supervised learning algorithms. InInternational Conference on Machine Learning, pages 161-168, 2006. |

[11] | Rich Caruana, Nikos Karampatziakis, and Ainur Yessenalina. An empirical evaluation of supervised learning in high dimensions. InInternational Conference on Machine learning, pages 96-103, 2008. |

[12] | Tianqi Chen.xgboost: Extreme gradient boosting, 2018. URLhttps://cran.r-project. org/web/packages/xgboost/. |

[13] | Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. InInternational Conference on Knowledge Discovery and Data Mining, pages 785-794, 2016. |

[14] | Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. InACM Symposium on Theory of Computing, pages 537-546, New York, NY, USA, 2008. ACM. |

[15] | Sanjoy Dasgupta and Yoav Freund. Random projection trees for vector quantization.IEEE Transactions on Information Theory, 55(7), 2009. |

[16] | Sanjoy Dasgupta and Kaushik Sinha. Randomized partition trees for exact nearest neighbor search.Conference on Learning Theory, 2013. |

[17] | Luc Devroye, L´aszl´o Gy¨orfi, and G´abor Lugosi.A Probabilistic Theory of Pattern Recognition. Springer Science & Business Media, 1996. |

[18] | Dirk Eddelbuettel.Rcpp: seamless R and C++ integration, 2018. URLhttps://cran. r-project.org/web/packages/Rcpp/index.html. |

[19] | Stefan Falkner, Aaron Klein, and Frank Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. InInternational Conference on Machine Learning, 2018. |

[20] | Xiaoli Z. Fern and Carla E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. InInternational Conference on Machine Learning, pages 186-193, 2003. |

[21] | Manuel Fern´andez-Delgado, Eva Cernadas, Sen´en Barro, and Dinani Amorim. Do we need hundreds of classifiers to solve real world classification problems?Journal of Machine Learning Research, 15(1):3133-3181, 2014. |

[22] | Dmitriy Fradkin and David Madigan. Experiments with random projections for machine learning. InInternational Conference on Knowledge Discovery and Data Mining, pages 517-522, 2003. |

[23] | Jerome H. Friedman. Greedy function approximation: a gradient boosting machine.The Annals of Statistics, pages 1189-1232, 2001. |

[24] | David Heath, Simon Kasif, and Steven Salzberg. Induction of oblique decision trees. In International Joint Conferences on Artificial Intelligence, pages 1002-1007, 1993. |

[25] | Chinmay Hegde, Michael Wakin, and Richard Baraniuk. Random projections for manifold learning. InAdvances in Neural Information Processing Systems, pages 641-648, 2008. |

[26] | Gareth M. James. Variance and bias for general loss functions.Machine Learning, 51(2): 115-135, 2003. |

[27] | Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,Advances in Neural Information Processing Systems, pages 3146-3154. Curran Associates, Inc., 2017. |

[28] | Yann Lecun, Corinna Cortes, and Christopher J. C. Burges.The MNIST database of handwritten digits. URLhttp://yann.lecun.com/exdb/mnist/. |

[29] | Donghoon Lee, Ming-Hsuan Yang, and Songhwai Oh. Fast and accurate head pose estimation via random projection forests. InIEEE International Conference on Computer Vision, pages 1958-1966, 2015. |

[30] | Ping Li, Trevor J. Hastie, and Kenneth W. Church. Very sparse random projections. In International Conference on Knowledge Discovery and Data Mining, pages 287-296, 2006. |

[31] | Gilles Louppe.Understanding Random Forests: From Theory to Practice. PhD thesis, University of Li‘ege, 2014. |

[32] | Bjoern H. Menze, B. Michael Kelm, Daniel N. Splitthoff, Ullrich Koethe, and Fred A. Hamprecht. On oblique random forests. In Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, editors,Machine Learning and Knowledge Discovery in Databases, volume 6912 ofLecture Notes in Computer Science, pages 453- 469. Springer, Berlin, Heidelberg, 2011. |

[33] | Alessandro Motta, Meike Schurr, Benedikt Staffler, and Moritz Helmstaedter. Big data in nanoscale connectomics, and the greed for training labels.Current Opinion in Neurobiology, 55:180-187, 2019. |

[34] | Philipp Probst, Anne-Laure Boulesteix, and Bernd Bischl. Tunability: Importance of hyperparameters of machine learning algorithms.Journal of Machine Learning Research, 20(53):1-32, 2019. |

[35] | Tom Rainforth and Frank Wood.Canonical correlation forests.arXiv preprint arXiv:1507.05444, 2015. |

[36] | Juan J. Rodriguez, Ludmila I. Kuncheva, and Carlos J. Alonso. Rotation forest: A new classifier ensemble method.IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-1630, 2006. |

[37] | Erwan Scornet, G´erard Biau, and Jean-Philippe Vert. Consistency of random forests.The Annals of Statistics, 43(4):1716-1741, 2015. |

[38] | Tyler M. Tomita, Mauro Maggioni, and Joshua T. Vogelstein. Roflmao: Robust oblique forests with linear matrix operations. InSIAM International Conference on Data Mining, 2017. |

[39] | Gerard V. Trunk. A problem of dimensionality: A simple example.IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(3):306-307, 1979. |

[40] | Roman Vershynin.High Dimensional Probability: An Introduction with Applications in Data Science. 2019. URLhttps://www.math.uci.edu/ rvershyn/papers/HDP-book/ HDP-book.pdf. |

[41] | Marvin N. Wright.ranger: A fast implementation of random forests, 2018. URLhttps: //cran.r-project.org/web/packages/ranger/. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.