Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology. (English) Zbl 1202.62171

Ann. Appl. Stat. 4, No. 3, 1383-1402 (2010); correction ibid. 4, No. 4, 2203-2204 (2010).
Summary: Presence-only data, point locations where a species has been recorded as being present, are often used in modeling the distribution of a species as a function of a set of explanatory variables, whether to map species occurrence, to understand its association with the environment, or to predict its response to environmental change. Currently, ecologists most commonly analyze presence-only data by adding randomly chosen “pseudo-absences” to the data such that it can be analyzed using logistic regression, an approach which has weaknesses in model specification, in interpretation, and in implementation. To address these issues, we propose Poisson point process modeling of the intensity of presences. We also derive a link between the proposed approach and logistic regression, specifically, we show that as the number of pseudo-absences increases (in a regular or uniform random arrangement), logistic regression slope parameters and their standard errors converge to those of the corresponding Poisson point process model. We discuss the practical implications of these results. In particular, point process modeling offers a framework for choice of the number and location of pseudo-absences, both of which are currently chosen by ad hoc and sometimes ineffective methods in ecology, a point which we illustrate by example.


62P12 Applications of statistics to environmental and related topics
60G55 Point processes (e.g., Poisson, Cox, Hawkes processes)
62J12 Generalized linear models (logistic models)
65C60 Computational problems in statistics (MSC2010)


VEGAS; R; spatstat
Full Text: DOI arXiv


[1] Austin, M. P. (1985). Continuum concept, ordination methods and niche theory. Annual Review of Ecology, Evolution, and Systematics 16 39-61.
[2] Baddeley, A. and Turner, R. (2005). Spatstat: An R package for analyzing spatial point patterns. Journal of Statistical Software 12 1-42.
[3] Baddeley, A. J. and van Lieshout, M. (1995). Area-interaction point processes. Ann. Inst. Statist. Math. 47 601-619. · Zbl 0848.60051 · doi:10.1007/BF01856536
[4] Baddeley, A. J., Moller, J. and Waagepetersen, R. (2000). Non- and semiparametric estimation of interaction in inhomogeneous point patterns. Statist. Neerlandica 54 329-350. · Zbl 1018.62027 · doi:10.1111/1467-9574.00144
[5] Berman, M. and Turner, T. (1992). Approximating point process likelihoods with GLIM. J. Roy. Statist. Soc. Ser. C 41 31-38. · Zbl 0825.62614 · doi:10.2307/2347614
[6] Burnham, K. P. and Anderson, D. R. (1998). Model Selection and Inference-A Practical Information-Theoretic Approach . Springer, New York. · Zbl 1005.62007 · doi:10.1007/b97636
[7] Chefaoui, R. M. and Lobo, J. M. (2008). Assessing the effects of pseudo-absences on predictive distribution model performance. Ecological Modelling 210 478-486.
[8] Cressie, N. A. C. (1993). Statistics for Spatial Data . Wiley, New York. · Zbl 0799.62002
[9] Diggle, P. J. (2003). Statistical Analysis of Spatial Point Patterns , 2nd ed. Arnold, London. · Zbl 1021.62076
[10] Elith, J. and Leathwick, J. (2007). Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines. Diversity and Distributions 13 265-275.
[11] Elith, J. and Leathwick, J. (2009). Species distribution models: Ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics 40 677-697.
[12] Elith, J., Leathwick, J. R. and Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology 77 802-813.
[13] Guisan, A., Graham, C. H., Elith, J. and Huettmann, F. (2007). Sensitivity of predictive species distribution models to change in grain size. Diversity and Distributions 13 332-340.
[14] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models . Chapman & Hall, Boca Raton, FL. · Zbl 0747.62061
[15] Hernandez, P. A., Franke, I., Herzog, S. K., Pacheco, V., Paniagua, L., Quintana, H. L., Soto, A., Swenson, J. J., Tovar, C., Valqui, T. H., Vargas, J. and Young, B. E. (2008). Predicting species distributions in poorly-studied landscapes. Biodiversity and Conservation 17 1353-1366.
[16] Lepage, G. (1978). A new algorithm for adaptive multidimensional integration. J. Comput. Phys. 27 192-203. · Zbl 0377.65010 · doi:10.1016/0021-9991(78)90004-9
[17] Owen, A. B. (2007). Infinitely imbalanced logistic regression. J. Mach. Learn. Res. 8 761-773. · Zbl 1222.62094
[18] Pearce, J. L. and Boyce, M. S. (2006). Modelling distribution and abundance with presence-only data. Journal of Applied Ecology 43 405-412.
[19] Phillips, S. J., Anderson, R. P. and Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling 190 231-259.
[20] Phillips, S. J., Dudík, M., Elith, J., Graham, C. H., Lehmann, A., Leathwick, J. and Ferrier, S. (2009). Sample selection bias and presence-only distribution models: Implications for background and pseudo-absence data. Ecological Applications 19 181-197.
[21] R Development Core Team (2009). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria.
[22] Ward, G. (2007). Statistics in ecological modelling; presence-only data and boosted MARS. PhD thesis, Dept. Statistics, Stanford Univ. Available at .
[23] Ward, G., Hastie, T., Barry, S., Elith, J. and Leathwick, J. R. (2009). Presence-only data and the EM algorithm. Biometrics 65 554-563. · Zbl 1167.62098 · doi:10.1111/j.1541-0420.2008.01116.x
[24] Zarnetske, P. L., Edwards, T. C., Jr. and Moisen, G. G. (2007). Habitat classification modeling with incomplete data: Pushing the habitat envelope. Ecological Applications 17 1714-1726.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.