Scalable high-resolution forecasting of sparse spatiotemporal events with kernel methods: a winning solution to the NIJ “Real-time crime forecasting challenge”.

*(English)*Zbl 1435.62431Summary: We propose a generic spatiotemporal event forecasting method which we developed for the National Institute of Justice’s (NIJ) Real-Time Crime Forecasting Challenge (National Institute of Justice (2017)). Our method is a spatiotemporal forecasting model combining scalable randomized Reproducing Kernel Hilbert Space (RKHS) methods for approximating Gaussian processes with autoregressive smoothing kernels in a regularized supervised learning framework. While the smoothing kernels capture the two main approaches in current use in the field of crime forecasting, kernel density estimation (KDE) and self-exciting point process (SEPP) models, the RKHS component of the model can be understood as an approximation to the popular log-Gaussian Cox Process model. For inference, we discretize the spatiotemporal point pattern and learn a log-intensity function using the Poisson likelihood and highly efficient gradient-based optimization methods. Model hyperparameters including quality of RKHS approximation, spatial and temporal kernel lengthscales, number of autoregressive lags and bandwidths for smoothing kernels as well as cell shape, size and rotation, were learned using cross validation. Resulting predictions significantly exceeded baseline KDE estimates and SEPP models for sparse events.

##### MSC:

62P25 | Applications of statistics to social sciences |

62M10 | Time series, auto-correlation, regression, etc. in statistics (GARCH) |

62M20 | Inference from stochastic processes and prediction |

62H11 | Directional data; spatial statistics |

##### Keywords:

spatial statistics; time series; supervised learning; spatiotemporal forecasting; Cox process; RKHS
PDF
BibTeX
XML
Cite

\textit{S. Flaxman} et al., Ann. Appl. Stat. 13, No. 4, 2564--2585 (2019; Zbl 1435.62431)

**OpenURL**

##### References:

[1] | Adams, R. P., Murray, I. and MacKay, D. J. (2009). Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In Proceedings of the 26th Annual International Conference on Machine Learning 9-16. ACM, New York. |

[2] | Adepeju, M., Rosser, G. and Cheng, T. (2016). Novel evaluation metrics for sparse spatio-temporal point process hotspot predictions—A crime case study. Int. J. Geogr. Inf. Sci. 30 2133-2154. |

[3] | Berk, R., Heidari, H., Jabbari, S., Kearns, M. and Roth, A. (2018). Fairness in criminal justice risk assessments: The state of the art. Sociol. Methods Res. 0049124118782533. |

[4] | Bhatt, S., Cameron, E., Flaxman, S. R., Weiss, D. J., Smith, D. L. and Gething, P. W. (2017). Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization. J. R. Soc. Interface 14 20170520. |

[5] | Brix, A. and Diggle, P. J. (2001). Spatiotemporal prediction for log-Gaussian Cox processes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 823-841. · Zbl 0996.62076 |

[6] | Caplan, J. M., Kennedy, L. W. and Miller, J. (2011). Risk terrain modeling: Brokering criminological theory and GIS methods for crime forecasting. Justice Q. 28 360-381. |

[7] | Chainey, S. P. (2013). Examining the influence of cell size and bandwidth size on kernel density estimation crime hotspot maps for predicting spatial patterns of crime. Bull. Geogr. Soc. Liege 60 7-19. |

[8] | Chainey, S. and Ratcliffe, J. (2005). GIS and Crime Mapping. Wiley, New York. |

[9] | Chainey, S., Tompson, L. and Uhlig, S. (2008). The utility of hotspot mapping for predicting spatial patterns of crime. Secur. J. 21 4-28. |

[10] | Cohen, J., Gorr, W. L. and Olligschlaeger, A. M. (2007). Leading indicators and spatial interactions: A crime-forecasting model for proactive police deployment. Geogr. Anal. 39 105-127. |

[11] | Corbett-Davies, S., Pierson, E., Feller, A., Goel, S. and Huq, A. (2017). Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 797-806. ACM, New York. |

[12] | Cunningham, J. P., Shenoy, K. V. and Sahani, M. (2008). Fast Gaussian process methods for point process intensity estimation. In Proceedings of the 25th International Conference on Machine Learning 192-199. ACM, New York. |

[13] | Diggle, P. J., Moraga, P., Rowlingson, B. and Taylor, B. M. (2013). Spatial and spatio-temporal log-Gaussian Cox processes: Extending the geostatistical paradigm. Statist. Sci. 28 542-563. · Zbl 1331.86027 |

[14] | Dressel, J. and Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4 eaao5580. |

[15] | Flaxman, S. R. (2014). A general approach to prediction and forecasting crime rates with Gaussian processes. Technical report, Heinz College of Information Systems and Public Policy, Carnegie Mellon Univ., Pittsburgh, PA. |

[16] | Flaxman, S., Wilson, A., Neill, D., Nickisch, H. and Smola, A. (2015). Fast Kronecker inference in Gaussian processes with non-Gaussian likelihoods. In International Conference on Machine Learning 607-616. |

[17] | Flaxman, S., Chirico, M., Pereira, P. and Loeffler, C. (2019a). Supplement to “Scalable high-resolution forecasting of sparse spatiotemporal events with kernel methods: A winning solution to the NIJ ‘Real-Time Crime Forecasting Challenge’.” DOI:10.1214/19-AOAS1284SUPPA. |

[18] | Flaxman, S., Chirico, M., Pereira, P. and Loeffler, C. (2019b). Source code for “Scalable high-resolution forecasting of sparse spatiotemporal events with kernel methods: A winning solution to the NIJ ‘Real-Time Crime Forecasting Challenge’.” DOI:10.1214/19-AOAS1284SUPPB. |

[19] | Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1-22. |

[20] | Gerber, M. S. (2014). Predicting crime using Twitter and kernel density estimation. Decis. Support Syst. 61 115-125. |

[21] | Gorr, W. L. (2009). Forecast accuracy measures for exception reporting using receiver operating characteristic curves. Int. J. Forecast. 25 48-61. |

[22] | Gorr, W. L. and Lee, Y. (2015). Early warning system for temporary crime hot spots. J. Quant. Criminol. 31 25-47. |

[23] | Gorr, W., Olligschlaeger, A. and Thompson, Y. (2003). Short-term forecasting of crime. Int. J. Forecast. 19 579-594. |

[24] | Groff, E. and Taniguchi, T. (2019). Using citizen notification to interrupt near-repeat residential burglary patterns: The micro-level near-repeat experiment. J. Exp. Criminol. 15 115-149. |

[25] | Guttorp, P. and Gneiting, T. (2005). On the Whittle-Matérn correlation family. National Research Center for Statistics and the Environment-Technical Report Series, Seattle, WA. · Zbl 1436.62013 |

[26] | Heaton, M. J., Datta, A., Finley, A. O., Furrer, R., Guinness, J., Guhaniyogi, R., Gerber, F., Gramacy, R. B., Hammerling, D. et al. (2019). A case study competition among methods for analyzing large spatial data. J. Agric. Biol. Environ. Stat. 24 398-425. · Zbl 1426.62345 |

[27] | Hennig, P., Osborne, M. A. and Girolami, M. (2015). Probabilistic numerics and uncertainty in computations. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 471 20150142, 17. · Zbl 1372.65010 |

[28] | Hunt, J. M. (2016). Do crime hot spots move? Exploring the effects of the modifiable areal unit problem and modifiable temporal unit problem on crime hot spot stability. Ph.D. thesis, American Univ., Washington, DC. |

[29] | Johnson, S. D., Bowers, K. J., Birks, D. J. and Pease, K. (2009). Predictive mapping of crime by ProMap: Accuracy, units of analysis, and the environmental backcloth. In Putting Crime in Its Place (D. Weisburd, W. Bernasco and G. Bruinsma, eds.) 165-192. Springer, Dordrecht. |

[30] | Kang, H.-W. and Kang, H.-B. (2017). Prediction of crime occurrence from multi-modal data using deep learning. PLoS ONE 12 e0176244. |

[31] | Levine, N. (2004). CrimeStat: A spatial statistics program for the analysis of crime incident locations, version 3.0. Technical report, Ned Levine and Associates/National Institute of Justice, Washington, DC. |

[32] | Liu, H. and Brown, D. E. (2003). Criminal incident prediction using a point-pattern-based density model. Int. J. Forecast. 19 603-622. |

[33] | Lloyd, C., Gunter, T., Osborne, M. and Roberts, S. (2015). Variational inference for Gaussian process modulated Poisson processes. In International Conference on Machine Learning 1814-1822. |

[34] | Loeffler, C. and Flaxman, S. (2018). Is gun violence contagious? A spatiotemporal test. J. Quant. Criminol. 34 999-1017. |

[35] | Lum, K. and Isaac, W. (2016). To predict and serve? Significance 13 14-19. |

[36] | Makridakis, S., Spiliotis, E. and Assimakopoulos, V. (2018). Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE 13 e0194889. |

[37] | May, A., Bagheri Garakani, A., Lu, Z. et al. (2019). Kernel approximation methods for speech recognition. J. Mach. Learn. Res. 20 Paper No. 59, 36. · Zbl 07064039 |

[38] | Milton, P., Coupland, H. Giorgi, E. and Bhatt, S. (2019). Spatial analysis made easy with linear regression and kernels. Epidemics. DOI:10.1016/j.epidem.2019.100362. |

[39] | Mitchell, S., Potash, E., Barocas, S., D’Amour, A. and Lum, K. (2018). Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions. Available at arXiv:1811.07867. |

[40] | Mohler, G. (2013). Modeling and estimation of multi-source clustering in crime and security data. Ann. Appl. Stat. 7 1525-1539. · Zbl 1454.62504 |

[41] | Mohler, G. (2014). Marked point process hotspot maps for homicide and gun crime prediction in Chicago. Int. J. Forecast. 30 491-497. |

[42] | Mohler, G. and Porter, M. D. (2018). Rotational grid, PAI-maximizing crime forecasts. Stat. Anal. Data Min. 11 227-236. |

[43] | Mohler, G. O., Short, M. B., Brantingham, P. J., Schoenberg, F. P. and Tita, G. E. (2011). Self-exciting point process modeling of crime. J. Amer. Statist. Assoc. 106 100-108. · Zbl 1396.62224 |

[44] | Mohler, G. O., Short, M. B., Malinowski, S., Johnson, M., Tita, G. E., Bertozzi, A. L. and Brantingham, P. J. (2015). Randomized controlled field trials of predictive policing. J. Amer. Statist. Assoc. 110 1399-1411. |

[45] | Møller, J. and Rasmussen, J. G. (2005). Perfect simulation of Hawkes processes. Adv. in Appl. Probab. 37 629-646. · Zbl 1074.60057 |

[46] | Møller, J., Syversveen, A. R. and Waagepetersen, R. P. (1998). Log Gaussian Cox processes. Scand. J. Stat. 25 451-482. · Zbl 0931.60038 |

[47] | O’Hagan, A. (1992). Some Bayesian numerical analysis. In Bayesian Statistics, 4 (Peñíscola, 1991) 345-363. Oxford Univ. Press, New York. |

[48] | National Institute of Justice (2017). Real-time crime forecasting challenge. Available at http://www.nij.gov/funding/Pages/fy16-crime-forecasting-challenge.aspx. |

[49] | Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. J. Amer. Statist. Assoc. 83 9-27. |

[50] | Pease, K. et al. (1998). Repeat Victimisation: Taking Stock 90. Home Office Police Research Group, London. |

[51] | Perry, W. L., McInnis, B., Price, C. C., Smith, S. C. and Hollywood, J. S. (2013). Predictive policing: The role of crime forecasting in law enforcement operations. Technical report, RAND Corporation, Santa Monica, CA. |

[52] | Porter, M. D. and Reich, B. J. (2012). Evaluating temporally weighted kernel density methods for predicting the next event location in a series. Ann. GIS 18 225-240. |

[53] | Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 1177-1184. |

[54] | Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. · Zbl 1177.68165 |

[55] | Rodrigues, A. and Diggle, P. J. (2012). Bayesian estimation and prediction for inhomogeneous spatiotemporal log-Gaussian Cox processes using low-rank models, with application to criminal surveillance. J. Amer. Statist. Assoc. 107 93-101. · Zbl 1261.62086 |

[56] | Rosser, G. and Cheng, T. (2019). Improving the robustness and accuracy of crime prediction with the self-exciting point process through isotropic triggering. Appl. Spatial Anal. Policy 12 5-25. |

[57] | Rosser, G., Davies, T., Bowers, K. J., Johnson, S. D. and Cheng, T. (2016). Predictive crime mapping: Arbitrary grids or street networks? J. Quant. Criminol. 33 569-594. |

[58] | Rudin, C. and Ustun, B. (2018). Optimized scoring systems: Toward trust in machine learning for healthcare and criminal justice. Interfaces 48 449-466. |

[59] | Schölkopf, B. and Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge, MA. |

[60] | Schutt, H. G. (1922). Advanced police methods in Berkeley. Natl. Munic. Rev. 11 80-85. |

[61] | Shirota, S. and Gelfand, A. E. (2017). Space and circular time log Gaussian Cox processes with application to crime event data. Ann. Appl. Stat. 11 481-503. · Zbl 1416.62689 |

[62] | Snoek, J., Larochelle, H. and Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 2951-2959. |

[63] | Sriperumbudur, B. K., Fukumizu, K. and Lanckriet, G. R. G. (2011). Universality, characteristic kernels and RKHS embedding of measures. J. Mach. Learn. Res. 12 2389-2410. · Zbl 1280.68198 |

[64] | Sun, Y., Li, B. and Genton, M. G. (2012). Geostatistics for large datasets. In Advances and Challenges in Space-Time Modelling of Natural Events 55-77. Springer, New York. |

[65] | Taddy, M. A. (2010). Autoregressive mixture models for dynamic spatial Poisson processes: Application to tracking intensity of violent crime. J. Amer. Statist. Assoc. 105 1403-1417. · Zbl 1388.62379 |

[66] | Teh, Y. W. and Rao, V. (2011). Gaussian process modulated renewal processes. In Advances in Neural Information Processing Systems 2474-2482. |

[67] | Wang, X., Gerber, M. S. and Brown, D. E. (2012). Automatic crime prediction using events extracted from Twitter posts. In Social Computing Behavioral—Cultural Modeling and Prediction 231-238. Springer, Berlin. |

[68] | Weinberger, K., Dasgupta, A., Langford, J., Smola, A. and Attenberg, J. (2009). Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09 1113-1120. ACM, New York. |

[69] | Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301-320. · Zbl 1069.62054 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.