×

Small area estimation of the homeless in Los Angeles: an application of cost-sensitive stochastic gradient boosting. (English) Zbl 1202.62178

Summary: In many metropolitan areas efforts are made to count the homeless to ensure proper provision of social services. Some areas are very large, which makes spatial sampling a viable alternative to an enumeration of the entire terrain. Counts are observed in sampled regions but must be imputed in unvisited areas. Along with the imputation process, the costs of underestimating and overestimating may be different. For example, if precise estimation in areas with large homeless counts is critical, then underestimation should be penalized more than overestimation in the loss function.
We analyze data from the 2004–2005 Los Angeles County homeless study using an augmentation of \(L_{1}\) stochastic gradient boosting that can weight overestimates and underestimates asymmetrically. We discuss our choice to utilize stochastic gradient boosting over other function estimation procedures. In-sample fitted and out-of-sample imputed values, as well as relationships between the response and predictors, are analyzed for various cost functions. Practical usage and policy implications of these results are discussed briefly.

MSC:

62P25 Applications of statistics to social sciences
65C60 Computational problems in statistics (MSC2010)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Berk, R. A. (2008). Statistical Learning From a Regression Perspective . Springer, New York. · Zbl 1258.62047
[2] Berk, R. A. and MacDonald, J. (2010). Policing the homeless: An evaluation of efforts to reduce homeless-related crime. Criminology & Public Policy .
[3] Berk, R. A., Brown, L. and Zhao, L. (2010). Statistical inference after model selection. Journal of Quantitative Criminology . To appear. Available at .
[4] Berk, R. A., Kriegler, B. and Baek, J. H. (2006). Forecasting dangerous inmate misconduct: An application of ensemble statistical procedures. Journal of Quantitative Criminology 22 131-145.
[5] Berk, R. A., Kriegler, B. and Ylvisaker, D. (2008). Counting the homeless in Los Angeles County. In Probability and Statistics: Essays in Honor of David A. Freedman. Inst. Math. Statist. Collect. 2 127-141. IMS, Beachwood, OH. · Zbl 1166.62381 · doi:10.1214/193940307000000428
[6] Binder, H. (2009). GAMBoost: Generalized additive models by likelihood based boosting. R version 1.1. Available at .
[7] Bratton, W. and Knobler, P. (1998). The Turnaround: How America’s Top Cop Reversed the Crime Epidemic . Random House, New York.
[8] Breiman, L. (1996). Bagging predictors. Machine Learning 26 123-140. · Zbl 0858.68080
[9] Breiman, L. (2001). Random forests. Machine Learning 45 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[10] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth, Monterey, CA. · Zbl 0541.62042
[11] Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statist. Sci. 22 477-505. · Zbl 1246.62163 · doi:10.1214/07-STS242
[12] Bühlmann, P. and Yu, B. (2003). Boosting with the L 2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324-340. · Zbl 1041.62029 · doi:10.1198/016214503000125
[13] Cordray, D. S. and Pion, G. M. (1991). What’s behind the numbers? Definitional issues in counting the homeless. Housing Policy Debates 2 587-616.
[14] Cowen, D. D. (1991). Estimating census and survey undercounts through multiple service contacts. Housing Policy Debates 2 869-882.
[15] Culp, M. (2006). ada: Performs boosting algorithms for a binary response. R package 2.0-1. Available at .
[16] Culp, M., Michailidis, G. and Johnson, K. (2006). ada: An R package for stochastic boosting. Journal of Statistical Software 17 .
[17] Fan, W., Stolfo, S. J., Zhang, J. and Chan, P. K. (1999). Adacost: Misclassification cost-sensitive boosting. In Machine Learning: Proceedings of the Sixteenth International Conference 97-105. Bled, Slovenia.
[18] Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119-139. · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[19] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189-1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[20] Friedman, J. H. (2002). Stochastic gradient boosting. Comput. Statist. Data Anal. 38 367-378. · Zbl 1072.65502 · doi:10.1016/S0167-9473(01)00065-2
[21] Friedman, J. H., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Ann. Statist. 28 337-374. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[22] Harcourt, B. E. (2005). Policing L.A.’s skid row: Crime and real estate development in downtown Los Angeles (an experiment in real time). Univ. Chicago Legal Forum 2005 325-404.
[23] Hastie, T., Tibshirani, R. and Friedman J. (2001). The Elements of Statistical Learning . Springer, New York. · Zbl 0973.62007
[24] Hothorn, T. (2009). mboost: Model-based boosting. R package 1.1-2. Available at .
[25] Hyndman, R. J. and Fan, Y. (1996). Sample quantiles in statistical packages. Amer. Statist. 50 361-365.
[26] Koegel, P., Burnam, A. and Farr, R. K. (1988). The prevalence of specific psychiatric disorders among homeless individuals in the inner-city of Los Angeles. Archives of General Psychiatry 45 1085-1092.
[27] Koenker, R. (2005). Quantile Regression . Cambridge Univ. Press, New York. · Zbl 1111.62037
[28] Koenker, R. (2009). quantreg: Quantile regression. R package 4.38. Available at .
[29] Kushel, M. B., Evans, J. L., Perry, S., Robertson, M. J. and Moss, A. R. (2003). No door to lock: Victimization among homeless and marginally housed persons. Arch. Intern. Med. 163 2492-2499.
[30] Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econom. Theory 21 21-59. · Zbl 1085.62004 · doi:10.1017/S0266466605050036
[31] Leeb, H. and Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554-2591. · Zbl 1106.62029 · doi:10.1214/009053606000000821
[32] Lopez, S. (2005). Demons are winning on Skid Row. Los Angeles Times , October 16.
[33] Madigan, D. and Ridgeway, G. (2004). Discussion of “Least angle regression” by B. Efron et al. Ann. Statist. 32 465-469. · Zbl 1091.62054 · doi:10.1214/009053604000000067
[34] Magnano, P. F. and Blasi, G. (2007). Stuck on Skid Row. Los Angeles Times (Opinion Section) , October 29.
[35] Mease, D. and Wyner, A. (2008). Evidence contrary to the statistical view of boosting. J. Mach. Learn. Res. 9 131-156.
[36] Mease, D., Wyner, A. and Buja, A. (2007). Cost-weighted boosting with jittering and over/under-sampling: JOUS-boost. J. Mach. Learn. Res. 8 409-439. · Zbl 1222.68261
[37] Meinshausen, N. (2006). Quantile regression forests. J. Mach. Learn. Res. 7 983-999. · Zbl 1222.68262
[38] Rao, J. N. K. (2003). Small Area Estimation . Wiley, Hoboken, NJ. · Zbl 1026.62003
[39] Ridgeway, G. (2007). gbm: Generalized boosted regression models. R package 1.6-3. Available at .
[40] Rossi, P. H. (1989). Down and Out in America: The Origins of Homelessness . Univ. Chicago Press, Chicago.
[41] Rossi, P. H. (1991). Strategies for homeless research in the 1990s. Housing Policy Debates 2 1029-1055.
[42] Schapire, R. E. (1999). A brief introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence 2 1401-1406. Morgan Kaufmann, San Francisco, CA.
[43] Sexton, J. (2009) gbev: Gradient boosted regression trees with errors-in-variables. R package 0.1.1. Available at .
[44] Takeuchi, I., Le, Q. V., Sears, T. D. and Smola, A. J. (2006). Nonparametric quantile regression. J. Mach. Learn. Res. 7 1231-1264. · Zbl 1222.68316
[45] Ting, K. M. (2000). A comparative study of cost-sensitive boosting algorithms. In Proceedings of the Seventeenth International Conference on Machine Learning 983-990. Morgan Kaufmann, San Francisco, CA.
[46] Wilson, J. Q. and Kelling, G. L. (1982). Broken windows: The police and neighborhood safety. Atlantic Monthly March 29-38.
[47] Wright, J. D. and Devine, J. A. (1992). Counting the homeless: The Census Bureau’s ‘S-Night’ in five U.S. cities. Evaluation Review 16 355-364.
[48] Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538-1579. · Zbl 1078.62038 · doi:10.1214/009053605000000255
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.