Statistical theory powering data science.

*(English)*Zbl 1440.62399Summary: Statisticians are finding their place in the emerging field of data science. However, many issues considered “new” in data science have long histories in statistics. Examples of using statistical thinking are illustrated, which range from exploratory data analysis to measuring uncertainty to accommodating nonrandom samples. These examples are then applied to service networks, baseball predictions and official statistics.

##### MSC:

62R07 | Statistical aspects of big data and data science |

62G07 | Density estimation |

62P20 | Applications of statistics to economics |

60K25 | Queueing theory (aspects of probability theory) |

##### Keywords:

service networks; queueing theory; empirical Bayes; nonparametric estimation; sports statistics; decennial census; house price index**OpenURL**

##### References:

[1] | Adler, P. S., Mandelbaum, A., Nguyen, V. and Schwerer, E. (1995). From project to process management: An empirically-based framework for analyzing product development time. Manage. Sci. 41 458-484. · Zbl 0833.90079 |

[2] | Aldor-Noiman, S., Feigin, P. D. and Mandelbaum, A. (2009). Workload forecasting for a call center: Methodology and a case study. Ann. Appl. Stat. 3 1403-1447. · Zbl 1185.62204 |

[3] | Anderson, M. (2015). The American Census: A Social History, 2nd ed. Yale University Press, New Haven. |

[4] | Armony, M., Israelit, S., Mandelbaum, A., Marmor, Y. N., Tseytlin, Y. and Yom-Tov, G. B. (2015). On patient flow in hospitals: A data-based queueing-science perspective. Stoch. Syst. 5 146-194. · Zbl 1359.60116 |

[5] | Azriel, D., Feigin, P. and Mandelbaum, A. (2014). Erlang-S: A data-based model of servers in queueing networks. Manage. Sci. 65 4607-4635. |

[6] | Baccelli, F., Kauffmann, B. and Veitch, D. (2009). Inverse problems in queueing theory and Internet probing. Queueing Syst. 63 59-107. · Zbl 1209.90104 |

[7] | Bailey, M. J., Muth, R. F. and Nourse, H. O. (1963). A regression method for real estate price index construction. J. Amer. Statist. Assoc. 58 933-942. |

[8] | Bender-deMoll, S. and McFarland, D. A. (2006). The art and science of dynamic network visualization. J. Soc. Struct. 7 1-38. |

[9] | Berk, R., Brown, L. D., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802-837. · Zbl 1267.62080 |

[10] | Borst, S., Mandelbaum, A. and Reiman, M. I. (2004). Dimensioning large call centers. Oper. Res. 52 17-34. · Zbl 1165.90388 |

[11] | Bramson, M. (1998). State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Syst. 30 89-148. · Zbl 0911.90162 |

[12] | Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Stat. 42 855-903. · Zbl 0246.62016 |

[13] | Brown, L. D. (2008). In-season prediction of batting averages: A field test of empirical Bayes and Bayes methodologies. Ann. Appl. Stat. 2 113-152. · Zbl 1137.62419 |

[14] | Brown, L. D. (2015). Comments on “Methodological issues and challenges in the production of official statistics.” J. Surv. Statist. Methodol. 3 478-481. |

[15] | Brown, L. D. and Greenshtein, E. (2009). Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. Ann. Statist. 37 1685-1704. · Zbl 1166.62005 |

[16] | Brown, L., Gans, N., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S. and Zhao, L. (2005). Statistical analysis of a telephone call center: A queueing-science perspective. J. Amer. Statist. Assoc. 100 36-50. · Zbl 1117.62303 |

[17] | Cai, J. and Zhao, L. (2019). Nonparametric empirical Bayes method for sparse noisy signals. Preprint. |

[18] | Calhoun, C. (1996). OFHEO House Price Indices: HPI Technical Description. Available at https://www.fhfa.gov/PolicyProgramsResearch/Research/Pages/HPI-Technical-Description.aspx. |

[19] | Case, K. E. and Shiller, R. J. (1987). Prices of single-family homes since 1970: New indexes for four cities. N. Engl. Econ. Rev. Sept/Oct 45-56. |

[20] | Case, K. E. and Shiller, R. J. (1989). The efficiency of the market for single family homes. Am. Econ. Rev. 79 125-137. |

[21] | Chan, W. and L’Ecuyer, P. CCOptim: Call Center Optimization Java Library. Available at http://simul.iro.umontreal.ca/contactcenters/index.html. |

[22] | Chen, N., Lee, D. and Shen, H. (2018). Can Customer Arrival Rates Be Modelled by Sine Waves? Submitted. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3125120. |

[23] | Chen, H. and Yao, D. D. (2001). Fundamentals of Queueing Networks: Performance, Asymptotics, and Optimization, Stochastic Modelling and Applied Probability. Applications of Mathematics (New York) 46. Springer, New York. |

[24] | Chen, H., Harrison, J. M., Mandelbaum, A., Van Ackere, A. and Wein, L. (1988). Empirical evaluation of a queueing network model for semiconductor wafer fabrication. Oper. Res. 36 202-215. |

[25] | Citro, C. F. (2016). The US federal statistical system’s past, present, and future. Annu. Rev. Stat. Appl. 3 347-373. |

[26] | Cowling, A., Hall, P. and Phillips, M. J. (1996). Bootstrap confidence regions for the intensity of a Poisson point process. J. Amer. Statist. Assoc. 91 1516-1524. · Zbl 0882.62078 |

[27] | Dai, J. G. and He, S. (2010). Customer abandonment in many-server queues. Math. Oper. Res. 35 347-362. · Zbl 1222.60071 |

[28] | Dai, J. G., Yeh, D. H. and Zhou, C. (1997). The QNet method for re-entrant queueing networks with priority disciplines. Oper. Res. 45 610-623. · Zbl 0887.90073 |

[29] | Davenport, T. H. and Patil, D. J. (2012). Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review. |

[30] | Deo, S. and Lin, W. (2013). The impact of size and occupancy of hospital on the extent of ambulance diversion: Theory and evidence. Oper. Res. 61 544-562. · Zbl 1273.90050 |

[31] | Dicker, L. H. and Zhao, S.D. (2016). High-dimensional classification via nonparametric empirical Bayes and maximum likelihood inference. Biometrika 103 21-34. · Zbl 1452.62440 |

[32] | Dong, J., Yom-Tov, E. and Yom-Tov, G. B. (2018). The impact of delay announcements on hospital network coordination and waiting times. Manage. Sci. 65 1969-1994. |

[33] | Eberstadt, N., Nunn, R., Schanzenback, D. W. and Strain, M. R. (2017). “In order that they might rest their arguments on facts”: The vital role of government-collected data. The Hamilton Project at Brookings and the American Enterprise Institute. |

[34] | Efron, B. and Morris, C. (1975). The efficiency of logistic regression compared to normal discriminant analysis. J. Amer. Statist. Assoc. 70 311-319. · Zbl 0319.62018 |

[35] | Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—an empirical Bayes approach. J. Amer. Statist. Assoc. 68 117-130. · Zbl 0275.62005 |

[36] | Efron, B. and Morris, C. (1977). Stein’s paradox in statistics. Sci. Am. 236 119-127. |

[37] | Erlang, A. K. (1948). On the rational determination of the number of circuits. In The Life and Works of A. K. Erlang (E. Brockmeyer, H. L. Halstrom and A. Jensen, eds.) 216-221. The Copenhagen Telephone Company, Copenhagen. |

[38] | Federal Housing and Finance Agency. House Price Index, Quarterly Purchase-Only Indexes (Estimated Using Sales Price Data), 100 Largest Metropolitan Statistical Areas (Seasonally Adjusted and Unadjusted). Available at https://www.fhfa.gov/DataTools/Downloads/pages/house-price-index.aspx; accessed 30 January 2019. |

[39] | Feldman, Z. and Mandelbaum, A. (2010). Using simulation-based stochastic approximation to optimize staffing of systems with skills-based-routing. In Proceedings—Winter Simulation Conference. 3307-3317. |

[40] | Feldman, Z., Mandelbaum, A., Massey, W. A. and Whitt, W. (2008). Staffing of time-varying queues to achieve time-stable performance. Manage. Sci. 54 324-338. · Zbl 1232.90275 |

[41] | Gans, N., Koole, G. and Mandelbaum, A. (2003). Telephone call centers: Tutorial, review, and research prospects. Manuf. Serv. Oper. Manag. 5 79-141. |

[42] | Gans, N., Liu, N., Mandelbaum, A., Shen, H. and Ye, H. (2010). Service times in call centers: Agent heterogeneity and learning with some operational consequences. In Borrowing Strength: Theory Powering Applications—a Festschrift for Lawrence D. Brown. Inst. Math. Stat. (IMS) Collect. 6 99-123. IMS, Beachwood, OH. |

[43] | Gans, N., Shen, H., Zhou, Y. P., Korolev, N., McCord, A. and Ristock, H. (2015). Parametric forecasting and stochastic programming models for call-center workforce scheduling. Manuf. Serv. Oper. Manag. 17 571-588. |

[44] | Garnett, O., Mandelbaum, A. and Reiman, M. (2002). Designing a call center with impatient customers. Manuf. Serv. Oper. Manag. 4 208-227. |

[45] | Gershwin, G. and Gershwin, I. (1937). Let’s Call the Whole Thing Off. Shall We Dance? |

[46] | Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering: Stochastic Modelling and Applied Probability. Applications of Mathematics (New York) 53. Springer, New York. · Zbl 1038.91045 |

[47] | Glynn, P. W. and Iglehart, D. L. (1989). Importance sampling for stochastic simulations. Manage. Sci. 35 1367-1392. · Zbl 0691.65107 |

[48] | Groves, R. M. (2011). Three eras of survey reseach. Public Opin. Q. 75 861-871. |

[49] | Gu, J. and Koenker, R. (2017). Empirical Bayesball remixed: Empirical Bayes methods for longitudinal data. J. Appl. Econometrics 32 575-599. |

[50] | Gurvich, I. and Whitt, W. (2009). Queue-and-idleness-ratio controls in many-server service systems. Math. Oper. Res. 34 363-396. · Zbl 1213.60149 |

[51] | Ibrahim, R. (2018). Sharing delay information in service systems: A literature survey. Queueing Syst. 89 49-79. · Zbl 1405.90005 |

[52] | Ibrahim, R. and L’Ecuyer, P. (2013). Forecasting call center arrivals: Fixed-effects, mixed-effects, and bivariate models. Manuf. Serv. Oper. Manag. 15 72-85. |

[53] | Ibrahim, R. and Whitt, W. (2011). Wait-time predictors for customer service systems with time-varying demand and capacity. Oper. Res. 59 1106-1118. · Zbl 1233.90117 |

[54] | Ibrahim, R., Ye, H., L’Ecuyer, P. and Shen, H. (2016a). Modeling and forecasting call center arrivals: A literature survey and a case study. Int. J. Forecast. 32 865-874. |

[55] | Ibrahim, R., L’Ecuyer, P., Shen, H. and Thiongane, M. (2016b). Inter-dependent, heterogeneous, and time-varying service-time distributions in call centers. European J. Oper. Res. 250 480-492. · Zbl 1346.90253 |

[56] | James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361-379. Univ. California Press, Berkeley, CA. · Zbl 1281.62026 |

[57] | Jiang, W. and Zhang, C.-H. (2009). General maximum likelihood empirical Bayes estimation of normal means. Ann. Statist. 37 1647-1684. · Zbl 1168.62005 |

[58] | Jiang, W. and Zhang, C.-H. (2010). Empirical Bayes in-season prediction of baseball batting averages. In Borrowing Strength: Theory Powering Applications—a Festschrift for Lawrence D. Brown. Inst. Math. Stat. (IMS) Collect. 6 263-273. IMS, Beachwood, OH. |

[59] | Kang, W., Pang, G. (2013). Fluid limit of a many-server queueing network with abandonment. Preprint. http://scripts.cac.psu.edu/users/g/u/gup3/fluidnetwork2013r.pdf. |

[60] | Kaspi, H. and Ramanan, K. (2011). Law of large numbers limits for many-server queues. Ann. Appl. Probab. 21 33-114. · Zbl 1208.60095 |

[61] | Kim, S. H. and Whitt, W. (2014). Are call center and hospital arrivals well modeled by nonhomogeneous Poisson processes? Manuf. Serv. Oper. Manag. 16 464-480. |

[62] | Koenker, R. and Mizera, I. (2014). Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. J. Amer. Statist. Assoc. 109 674-685. · Zbl 1367.62020 |

[63] | Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer Series in Statistics. Springer, New York. · Zbl 1277.62021 |

[64] | Lauger, A., Wisniewski, B. and McKenna, L. (2014). Disclosure Avoidance Techniques at the U.S. Census Bureau: Current Practices and Research. Research Report Series (Disclosure Avoidance #2014-02). |

[65] | Li, G., Huang, J. Z. and Shen, H. (2018). To wait or not to wait: Two-way functional hazards model for understanding waiting in call centers. J. Amer. Statist. Assoc. 113 1503-1514. · Zbl 1409.62193 |

[66] | Lindley, D. V. (1962). Discussion on professor Stein’s paper. J. R. Stat. Soc. 24 285-287. |

[67] | Madison, J. (1790). Census of the Union. In Annals of Congress, House of Representatives, 1st Congress, 2nd Session. |

[68] | Maman, S. (2009). Uncertainty in the demand for service: The case of call centers and emergency departments Ph.D. thesis Technion-Israel Institute of Technology, Faculty of Industrial. http://ie.technion.ac.il/serveng/References/Thesis_Shimrit.pdf. |

[69] | Mandelbaum, A. and Momčilović, P. (2012). Queues with many servers and impatient customers. Math. Oper. Res. 37 41-65. · Zbl 1239.60086 |

[70] | Mandelbaum, A. and Zeltyn, S. (2009). Staffing many-server queues with impatient customers: Constraint satisfaction in call centers. Oper. Res. 57 1189-1205. · Zbl 1233.90121 |

[71] | Mandelbaum, A. and Zeltyn, S. (2013). Data-stories about (im)patient customers in tele-queues. Queueing Syst. 75 115-146. · Zbl 1277.90033 |

[72] | Mandelbaum, A., Momčilović, P., Trichakis, N., Kadish, S., Leib, R. and Bunnell, C. (2017). Data-driven appointment-scheduling under uncertainty: The case of an infusion unit in a cancer center. Under Revision to Management Science. |

[73] | Matteson, D. S., McLean, M. W., Woodard, D. B. and Henderson, S. G. (2011). Forecasting emergency medical service call arrival rates. Ann. Appl. Stat. 5 1379-1406. · Zbl 1223.62161 |

[74] | Meng, X.-L. (2018). Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election. Ann. Appl. Stat. 12 685-726. · Zbl 1405.62241 |

[75] | Muralidharan, O. (2010). An empirical Bayes mixture method for effect size and false discovery rate estimation. Ann. Appl. Stat. 4 422-438. · Zbl 1189.62004 |

[76] | Muthuraman, K. and Zha, H. (2008). Simulation-based portfolio optimization for large portfolios with transaction costs. Math. Finance 18 115-134. · Zbl 1138.91560 |

[77] | Nagaraja, C. H. (2019). Measuring Society. CRC Press, Boca Raton, FL. |

[78] | Nagaraja, C. H. and Brown, L. D. (2013). Constructing and evaluating an autoregressive house price index. In Topics in Applied Statistics (M. Hu, Y. Liu and J. Lin, eds.). Springer Proceedings in Mathematics & Statistics 55 3-12. Springer, Berlin. |

[79] | Nagaraja, C. H., Brown, L. D. and Wachter, S. (2014). Repeat sales house price index methodology. J. Real Estate Lit. 22 23-46. |

[80] | Nagaraja, C. H., Brown, L. D. and Zhao, L. H. (2011). An autoregressive approach to house price modeling. Ann. Appl. Stat. 5 124-149. · Zbl 1220.62109 |

[81] | Newman, M. E. J. (2008). The Mathematics of Networks. The New Palgrave Encyclopedia of Economics. Palgrave Macmillan, Basingstoke, UK. |

[82] | Newman, M. (2018). Networks. Oxford Univ. Press, Oxford. |

[83] | Pfeffermann, D. (2015). Methdological issues and challenges in the production of official statistics: 24th Annual Morris Hansen Lecture. J. Surv. Statist. Methodol. 3 425-477. |

[84] | Pidgin, C. F. (1888). Practical Statistics: A Handbook for the Use of the Statistician at Work, Students in Colleges and Academies, Agents, Census Enumerators, Etc. The W.E. Smythe Company. |

[85] | Puhalskii, A. A. and Reiman, M. I. (2000). The multiclass \(GI/PH/N\) queue in the Halfin-Whitt regime. Adv. in Appl. Probab. 32 564-595. · Zbl 0962.60089 |

[86] | Raykar, V. and Zhao, L. (2010). Nonparametric prior for adaptive sparsity. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics 629-636. |

[87] | Reed, J. and Tezcan, T. (2012). Hazard rate scaling of the abandonment distribution for the \(GI/M/n+GI\) queue in heavy traffic. Oper. Res. 60 981-995. · Zbl 1260.90072 |

[88] | Reich, M. (2011). The workload process: Modelling, inference and applications. M. Sc. research proposal. |

[89] | Robert, P. (2003). Stochastic Networks and Queues: Stochastic Modelling and Applied Probability, French ed. Applications of Mathematics (New York) 52. Springer, Berlin. |

[90] | Sangalli, L. M. (2018). The role of statistics in the era of big data. Statist. Probab. Lett. 136 1-3. · Zbl 1395.00047 |

[91] | SEELab (Service Enterprise Engineering Laboratory). Available at https://web.iem.technion.ac.il/en/service-enterprise-engineering-see-lab/general-information.html. |

[92] | Senderovich, A. (2016). Queue Mining: Service Perspectives in Process Mining Ph.D. thesis Technion-Israel Institute of Technology, Faculty of Industrial. http://ie.technion.ac.il/serveng/References/Thesis_Submission_Arik_Senderovich.pdf. |

[93] | Senderovich, A., Weidlich, M., Gal, A. and Mandelbaum, A. (2015). Queue mining for delay prediction in multi-class service processes. Inf. Syst. 53 278-295. |

[94] | Shen, H. and Brown, L. D. (2006). Non-parametric modelling for time-varying customer service time at a bank call centre. Appl. Stoch. Models Bus. Ind. 22 297-311. · Zbl 1114.62055 |

[95] | Shen, H. and Huang, J. Z. (2008a). Interday forecasting and intraday updating of call center arrivals. Manuf. Serv. Oper. Manag. 10 391-410. |

[96] | Shen, H. and Huang, J. Z. (2008b). Forecasting time series of inhomogeneous Poisson processes with application to call center workforce management. Ann. Appl. Stat. 2 601-623. · Zbl 1400.62350 |

[97] | Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954-1955, Vol. I 197-206. Univ. California Press, Berkeley and Los Angeles. |

[98] | Stein, C. M. (1962). Confidence sets for the mean of a multivariate normal distribution. J. Roy. Statist. Soc. Ser. B 24 265-296. · Zbl 0126.34602 |

[99] | Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Ann. Math. Stat. 42 385-388. · Zbl 0222.62006 |

[100] | Strawderman, W. E. (1973). Proper Bayes minimax estimators of the multivariate normal mean vector for the case of common unknown variances. Ann. Statist. 1 1189-1194. · Zbl 0286.62007 |

[101] | Taylor, J. W. (2012). Density forecasting of intraday call center arrivals using models based on exponential smoothing. Manage. Sci. 58 534-549. |

[102] | Torrieri, N., ACSO, DSSD and SEHSD Program Staff (2014). American Community Survey Design and Methodology. U.S. Census Bureau. |

[103] | Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley Series in Behavioral Science. Addison-Wesley Pub. Co., Reading, MA. |

[104] | van Dyk, D., Fuentes, M., Jordan, M. I., Newton, M., Ray, B. R., Temple Lang, D. and Wickham, H. (2015). ASA Statement on the Role of Statistics in Data Science. Amstat news. |

[105] | Vardi, Y. (1996). Network tomography: Estimating source-destination traffic intensities from link data. J. Amer. Statist. Assoc. 91 365-377. · Zbl 0871.62103 |

[106] | Varian, H. (2009). Hal Varian on how the Web challenges managers. McKinsey & Company. |

[107] | Weinberg, J., Brown, L. D. and Stroud, J. R. (2007). Bayesian forecasting of an inhomogeneous Poisson process with applications to call center data. J. Amer. Statist. Assoc. 102 1185-1198. · Zbl 1333.62297 |

[108] | Weinstein, A., Ma, Z., Brown, L. D. and Zhang, C.-H. (2018). Group-linear empirical Bayes estimates for a heteroscedastic normal mean. J. Amer. Statist. Assoc. 113 698-710. · Zbl 1398.62067 |

[109] | Whitt, W. (1983). The queueing network analyzer. Bell Syst. Tech. J. 62 2779-2815. |

[110] | Whitt, W. (1992). Understanding the efficiency of multi-server service systems. Manage. Sci. 38 708-723. · Zbl 0825.90409 |

[111] | Whitt, W. (2002a). Stochastic-Process Limits: An Introduction to Stochastic-Process Limits and Their Application to Queues. Springer Series in Operations Research. Springer, New York. · Zbl 0993.60001 |

[112] | Whitt, W. (2002b). Stochastic models for the design and management of customer contact centers: Some research directions. Department of Industrial Engineering and Operations Research, Columbia Univ., New York. |

[113] | Whitt, W. (2012). Fitting birth-and-death queueing models to data. Statist. Probab. Lett. 82 998-1004. · Zbl 1241.62134 |

[114] | Wright, C. D. and Hunt, W. O. (1900). The history and growth of the United States census: Prepared for the Senate Committee on the Census. In 56th Congreess, 1st Session; Document No. 194. |

[115] | Xie, X., Kou, S. C. and Brown, L. D. (2012). SURE estimates for a heteroscedastic hierarchical model. J. Amer. Statist. Assoc. 107 1465-1479. · Zbl 1284.62450 |

[116] | Ye, H., Luedtke, J. and Shen, H. (2019). Call center arrivals: When to jointly forecast multiple streams? Prod. Oper. Manag. 28 27-42. |

[117] | Yom-Tov, G. and Mandelbaum, A. (2014). Erlang-R: A time-varying queue with reentrant customers, in support of healthcare staffing. Manuf. Serv. Oper. Manag. 16 283-299. |

[118] | Zeltyn, S. and Mandelbaum, A. (2005). Call centers with impatient customers: Many-server asymptotics of the \(M/M/n+G\) queue. Queueing Syst. 51 361-402. · Zbl 1085.60072 |

[119] | Zeltyn, S., Marmor, Y. N., Mandelbaum, A., Carmeli, B., Greenshpan, O., Mesika, Y., Wasserkrug, S., Vortman, P., Schwartz, D. et al. (2011). Simulation-based models of emergency departments: Real-time control, operations planning and scenario analysis. ACM Trans. Model. Comput. Simul. 21 3. |

[120] | Zhang, P. and Serban, N. (2007). Discovery, visualization and performance analysis of enterprise workflow. Comput. Statist. Data Anal. 51 2670-2687. · Zbl 1161.90429 |

[121] | U.S. Census Bureau (1907). Heads of Families at the First Census of the United States Taken in the Year 1790. Government Printing Office, Washington, DC. |

[122] | U.S. Census Bureau (2009). TIGER/Line Shapefiles. Available at https://www.census.gov/geo. |

[123] | U.S. Census Bureau. Decennial census of population and housing. Available at https://www.census.gov/programs-surveys/decennial-census.html. |

[124] | U.S. Census Bureau (2010). Census Bureau Launches 2010 Census Advertising Campaign: Communication Effort Seeks to Boost Nation’s Mail-Back Participation Rates. Available at https://www.census.gov/newsroom/releases/archives/2010_census/cb10-cn08.html, January 2010. |

[125] | U.S. Census Bureau (2017a). “Annual Estimates of the Resident Population: April 1, 2010 to July 1, 2017—Table PEPANNRES.” Population Estimates Program. |

[126] | U.S. Census Bureau (2017b). “Geographic Mobility by Selected Characteristics in the United States—Table S0701.” American Community Survey 1-Year Estimates. Available at https://factfinder.census.gov. |

[127] | U.S. Census Bureau (2017c). “Citizen, Voting-age Population by Age—Table B29001.” American Community Survey 1-Year Estimates. Available at https://factfinder.census.gov. |

[128] | U.S. Census Bureau (2017d). “Field of Bachelor’s Degree for First Major—Table S1502.” American Community Survey 1-Year Estimates. Available at https://factfinder.census.gov. |

[129] | U.S. Census Bureau (2017e). “Commuting Characteristics by Sex—Table S0801.” American Community Survey 1-Year Estimates. Available at https://factfinder.census.gov. |

[130] | U.S. Census Bureau (2017f). “Veteran Status—Table S2101.” American Community Survey 1-Year Estimates. Available at https://factfinder.census.gov. |

[131] | U.S. Census Bureau—Census History Staff (2017g). Title 13, U.S. Code. Available at https://www.census.gov/history/www/reference/privacy_confidentiality/title_13_us_code.html. Last revised: July 18, 2017. |

[132] | U.S. Census Bureau—Census History Staff (2017h). Title 26, U.S. Code. Available at https://www.census.gov/history/www/reference/privacy_confidentiality/title_26_us_code_1.html. Last revised July 18, 2017. |

[133] | U. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.