×

Machine learning feature selection methods for landslide susceptibility mapping. (English) Zbl 1322.86014

Summary: This paper explores the use of adaptive support vector machines, random forests and AdaBoost for landslide susceptibility mapping in three separated regions of Canton Vaud, Switzerland, based on a set of geological, hydrological and morphological features. The feature selection properties of the three algorithms are studied to analyze the relevance of features in controlling the spatial distribution of landslides. The elimination of irrelevant features gives simpler, lower dimensional models while keeping the classification performance high. An object-based sampling procedure is considered to reduce the spatial autocorrelation of data and to estimate more reliably generalization skills when applying the model to predict the occurrence of new unknown landslides. The accuracy of the models, the relevance of features and the quality of landslide susceptibility maps were found to be high in the regions characterized by shallow landslides and low in the ones with deep-seated landslides. Despite providing similar skill, random forests and AdaBoost were found to be more efficient in performing feature selection than adaptive support vector machines. The results of this study reveal the strengths of the classification algorithms, but evidence: (1) the need for relying on more than one method for the identification of relevant variables; (2) the weakness of the adaptive scaling algorithm when used with landslide data; and (3) the lack of additional features which characterize the spatial distribution of deep-seated landslides.

MSC:

86A99 Geophysics
68T05 Learning and adaptive systems in artificial intelligence
86A60 Geological problems
62P12 Applications of statistics to environmental and related topics
PDF BibTeX XML Cite
Full Text: DOI Link

References:

[1] Adrizzone, F; Cardinali, M; Carrara, A; Guzzetti, F; Reichenbach, P, Impact of mapping errors on the reliability of landslide hazard, Nat Hazard Earth Sys, 2, 3-14, (2002)
[2] Aleotti P, Chowdhury R (1999) Landslide hazard assessment: summary review and new perspectives. B Eng Geol Environ 58:21-44. doi:10.1007/s100640050066
[3] Atkinson, PM; Massari, R, Generalised linear modelling of susceptibility to landsliding in central appennines, Italy, Comp Geosci, 24, 373-385, (1998)
[4] Ayalew, L; Yamagishi, H, The application of GIS-based logistic regression for landslide susceptibility mapping in kakuda-yahiko mountains, central Japan, Geomorphology, 65, 15-31, (2005)
[5] Ballabio, C; Sterlacchini, S, Support vector machines for landslide susceptibility mapping: the staffora river basin case study, Italy, Math Geosci, 40, 47-70, (2012)
[6] Bollinger, D; Hegg, C; Keusen, HR; Lateltin, O, Ursachenanalyse der hanginstabilitäten 1999, Bull Angew Geol, 5, 5-38, (2012)
[7] Bonnard, C, Evaluation et prédiction des mouvements des grandes phénomènes d’instabilité de pente, Bull Angew Geol, 11, 89-100, (2006)
[8] Breiman, L, Random forests, Mach Learn, 45, 5-32, (2001) · Zbl 1007.68152
[9] Brenning, A, Spatial prediction models for landslide hazards: review, comparison and evaluation, Nat Hazard Earth Sys, 5, 853-862, (2005)
[10] Brenning, A, Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection, Remote Sens Environ, 113, 239-247, (2009)
[11] Brenning A (2012), Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package sperrorest. In: International geoscience and remote sensing symposium (IGARSS), IEEE, International, pp 5372-5375. doi:10.1109/IGARSS.2012.6352393
[12] Brenning, A; Trombotto, D, Logistic regression modeling of rock glacier and glacier distribution: topographic and climatic controls in the semi-arid andes, Geomorphology, 81, 141-154, (2006)
[13] Canu S, Grandvalet Y, Guigue V, Rakotomamonjy A (2005) SVM and Kernel Methods Matlab toolbox. Perception Systèmes et Information, INSA de Rouen, Rouen, France
[14] Carrara, A, Multivariate models for landslide hazard evaluation, Math Geol, 15, 403-426, (1983)
[15] Carrara, A; Cardinali, M; Detti, R; Guzzetti, F; Pasqui, V; Reichenbach, P, GIS techniques and statistical models in evaluating landslide hazard, Earth Surf Proc Land, 16, 427-445, (1991)
[16] Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, pp 161-168. doi:10.1145/1143844.1143865
[17] Chang, CC; Lin, CJ, LIBSVM: a library for support vector machines, ACM Trans Intell Syst Technol, 2, 1-27, (2001)
[18] Cherkassky V, Mulier F (2007) Learning from data: concepts, theory, and methods. Wiley, New York · Zbl 1130.62002
[19] Daia, F; Lee, C, Landslide characteristics and slope instability modeling using GIS, lantau island, Hong Kong, Geomorphology, 42, 213-228, (2002)
[20] Dietrich, WE; Reiss, R; Hsu, ML; Montgomery, DR, A process-based model for colluvial soil depth and shallow landsliding using digital elevation data, Hyrol Process, 9, 383-400, (1995)
[21] Egan J (1975) Signal detection theory and ROC analysis. Academic Press, New York
[22] Ermini, L; Catani, F; Casagli, N, Artificial neural networks applied to landslide susceptibility assessment, Geomorphology, 66, 327-343, (2005)
[23] Fisher, RA, The use of multiple measurements in taxonomic problems, Ann Eugen, 7, 179-188, (1936)
[24] Foresti, L; Tuia, D; Kanevski, M; Pozdnoukhov, A, Learning wind fields with multiple kernels, Stoch Env Res Risk A, 25, 51-66, (2011)
[25] Foresti, L; Kanevski, M; Pozdnoukhov, A, Kernel-based mapping of orographic rainfall enhancement in the swiss alps as detected by weather radar, IEEE T Geosci Remote, 99, 1-14, (2012)
[26] Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119-139. doi:10.1006/jcss1997.1504
[27] Friedman, J, Greedy function approximation: a gradient boosting machine, Ann Stat, 29, 1189-1232, (2001) · Zbl 1043.62034
[28] Friedman, J, Stochastic gradient boosting, Comput Stat Data An, 38, 367-378, (2002) · Zbl 1072.65502
[29] Gallus D (2010) Gaussian processes for classification of spatial data in context of an early warning chain. Dissertation, Karlsruhe Institute of Technology
[30] Goetz, JN; Guthrie, RH; Brenning, A, Integrating physical and empirical landslide susceptibility models using generalized additive models, Geomorphology, 129, 376-386, (2011)
[31] Grandvalet, Y; Canu, S, Adaptive scaling for feature selection in svms, Adv Neur In, 15, 553-560, (2003)
[32] Guyon, I; Elisseeff, A, An introduction to variable and feature selection, J Mach Learn Res, 3, 1157-1182, (2003) · Zbl 1102.68556
[33] Guyon I, Gunn S, Nikravesh M, Zadeh L (eds) (2006) Feature extraction: foundations and applications. Springer, New York · Zbl 1114.68059
[34] Guzzetti, F; Carrara, A; Cardinali, M; Reichenbach, P, Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, central Italy, Geomorphology, 31, 181-216, (1999)
[35] Guzzetti, F; Reichenbach, P; Ardizzone, F; Cardinali, M; Galli, M, Estimating the quality of landslide susceptibility models, Geomorphology, 81, 166-184, (2006)
[36] Guzzetti, F; Adrizzone, F; Cardinali, M; Rossi, M; Valigi, D, Landslide volumes and landslide mobilization rates in umbria, central Italy, Earth Planet Sci Lett, 279, 222-229, (2009)
[37] Haykin S (1999) Neural Netwoks: a comprehensive foundation, 2nd edn. Prentice Hall, Upper Saddle River · Zbl 0934.68076
[38] Kalbermatten, M; Ville, D; Turberg, P; Tuia, D; Joost, S, Multiscale analysis of geomorphological and geological features in high resolution digital elevation models using the wavelet transform, Geomorphology, 138, 352-363, (2011)
[39] Kanevski M, Pozdnoukhov A, Timonin V (2009) Machine Learning For Spatial Environmental Data: Theory. Applications and Software. EPFL Press, Lausanne
[40] Lee, E, A computer program for linear logistic regression analysis, Comput Prog Biomed, 4, 80-92, (1974)
[41] Lee, L; Ryu, J; Won, J; Park, H, Determination and application of the weights for landslide susceptibility mapping using and artificial neural network, Eng Geol, 71, 289-302, (2004)
[42] Liaw, A; Wiener, M, Classification and regression by random forest, R News, 2, 18-22, (2002)
[43] Liess, M; Glaser, B; Huwe, B, Functional soil-landscape modelling to estimate slope stability in a steep Andean mountain forest region, Geomorphology, 132, 287-299, (2011)
[44] Lin, HT; Lin, CJ; Weng, RC, A note on platt’s probabilistic outputs for support vector machines, Mach Learn, 68, 267-276, (2007)
[45] Marmion, M; Hjort, J; Thuiller, W; Luoto, M, A comparison of predictive methods in modelling the distribution of periglacial landforms in finnish lapland, Earth Surf Proc Land, 33, 2241-2254, (2008)
[46] Marmion, M; Hjort, J; Thuiller, W; Luoto, M, Statistical consensus methods for improving predictive geomorphology maps, Comp Geosci, 35, 615-625, (2009)
[47] Melchiorre, C; Matteucci, M; Azzoni, A; Zanchi, A, Artificial neural networks and cluster analysis in landslide susceptibility zonation, Geomorphology, 94, 379-400, (2008)
[48] Moguerza, JM; Munoz, A, Support vector machines with applications, Stat Sci, 21, 322-336, (2006) · Zbl 1246.68185
[49] Montgomery, DR; Dietrich, WE, A physically based model for the topographic control on shallow landsliding, Water Resour Res, 30, 1153-1171, (1994)
[50] Mosar, J; Stampfli, GM; Girod, F, Western préalpes Médianes romandes: timing and structure. A review, Eclogae Geol Helv, 89, 389-425, (1996)
[51] Muchoney, D; Strahler, A, Pixel- and site-based calibration and validation methods for evaluating supervised classification of remotely sensed data, Remote Sens Environ, 81, 290-299, (2002)
[52] Neaupane, K; Achet, S, Use of backpropagation neural network for landslide monitoring: a case study in the higher himalaya, Eng Geol, 74, 213-226, (2004)
[53] Nefeslioglu, H; Gokceoglu, C; Sonmez, H, An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for preparation of landslide susceptibility maps, Eng Geol, 97, 171-191, (2008)
[54] Nicodemus, KK, Letter to the editor: on the stability and ranking of predictors from random forest variable importance measures predictors from random forest variable importance measures, Brief Bioinform, 12, 369-373, (2011)
[55] Noverraz F (1994) Carte des instabilitiés de terrain du Canton de Vaud. Rapport conclusif et explicatif des travaux de levé de cartes. Swiss Federal Institute of Technology, Lausanne
[56] Noverraz F, Bonnard C (1990) Mapping methodology of landslide and rockfall in Switzerland. In: ALPS 90, Alpine landslide practical seminar, Milano, pp 43-53
[57] Ohlmacher, GC; Davis, JC, Using multiple logistic regression and GIS technology to predict landslide hazard in northeast kansas, USA, Eng Geol, 69, 331-343, (2003)
[58] Otey, ME; Ghoting, A; Parthasarathy, S, Fast distributed outlier detection in mixed-attribute data sets, Data Min Knowl Disc, 12, 203-228, (2006)
[59] Park, NW; Chi, KH, Quantitative assessment of landslide susceptibility using high-resolution remote sensing data and a generalized additive model, Int J Remote Sens, 29, 247-264, (2008)
[60] Pedrazzini A, Surace I, Horton P, Loye A (2008) Cartes Indicatives de Danger des Mouvements de Versants du Canton de Vaud. Faculty of Geosciences and Environment, University of Lausanne
[61] Platt, J; Smola, AJ (ed.); Bartlett, P (ed.); Schölkopf, B (ed.); Schuurmans, D (ed.), Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, 61-74, (1999), Cambridge
[62] Pradhan, B; Lee, S, Landslide susceptibility assessment and factor effect analysis: back-propagation artificial neural networks and their comparison with frequency ration and bivariate logistic regression modelling, Environ Modell Softw, 25, 747-759, (2010)
[63] R Core Team (2013) R: A language and environment for statistical computing. Vienna, Austria. http://www.R-project.org/. Accessed 17 January 2013
[64] Ridgeway G (2013) gmb: Generalized Boosted Regression Models. R package version 2.1. http://CRAN.R-project.org/package=gbm. Accessed 17 January 2013
[65] Schölkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
[66] Soeters, R; Westen, CJ; Turner, AK (ed.); Schuster, RL (ed.), Slope instability recognition, analysis, and zonation, 129-177, (1996), Washington D.C.
[67] Strobl, C; Boulestiex, AL; Zeileis, A; Hothorn, T, Bias in random forest variable importance measures: illustrations, sources and a solution, BMC Bioinform, 8, 25, (2007)
[68] Stumpf, A; Kerle, N, Object-oriented mapping of landslides using random forests, Remote Sens Environ, 115, 2564-2577, (2011)
[69] Suzen, M; Doyuran, V, Data driven bivariate landslide susceptibility assessment using geographical information systems: a method and application to asarsuyu catchment, Turkey, Eng Geol, 71, 303-321, (2004)
[70] Tacher, L; Bonnard, C; Laloui, L; Parriaux, A, Modelling the behaviour of a large landslide with respect to hydrogeological and geomechanical parameter heterogeneity, Landslides, 2, 3-14, (2005)
[71] Tarboton DG (2005) Terrain analysis using digital elevation models (TauDEM). http://hydrology.usu.edu. Accessed 21 November 2012
[72] Terlien, M; Westen, CJ; Asch, T; Carrara, A (ed.); Guzzetti, F (ed.), Deterministic modelling in GIS-based landslide hazard assessment, 55-77, (1995), Dordrecht
[73] Trumpy, R, Geology of Switzerland, a guide book. part A, an outline of the geology of Switzerland, Earth Sci Rev, 17, 3, (1980)
[74] Tullen, R, Glissement de la chenolette (bex-LES plans, VD), Bull Géol Appl, 5, 39-45, (2000)
[75] Vapnik V (1998) Statistical learning theory. Wiley, New York · Zbl 0935.62007
[76] Varnes DJ (1984) Landslide hazard zonation: a review of principles and practice. Commission of Landslide of IAEG, UNESCO, Natural Hazards, Paris
[77] Westen, CJ; Asch, T; Soeters, R, Landslide hazard and risk zonation: why is it still so difficult?, B Eng Geol Environ, 65, 167-184, (2005)
[78] Westen, CJ; Castellanos Abella, EA, Spatial data for landslide susceptibility, hazards and vulnerability assessment: an overview, Eng Geol, 102, 112-131, (2008)
[79] Yao, X; Tham, L; Dai, F, Landslide susceptibility mapping based on support vector machines: a case study on natural slopes of Hong Kong, China, Geomorphology, 101, 572-582, (2008)
[80] Yesilnacar, E; Topal, T, Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, hendek region (Turkey), Eng Geol, 79, 251-266, (2005)
[81] Yilmaz I (2010a) Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ Earth Sci 61:821-836. doi:10.1007/s12665-009-0394-9
[82] Yilmaz, I, The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks, Environ Earth Sci, 60, 505-519, (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.