×

Latent theme dictionary model for finding co-occurrent patterns in process data. (English) Zbl 1458.62265

Summary: Process data, which are temporally ordered sequences of categorical observations, are of recent interest due to its increasing abundance and the desire to extract useful information. A process is a collection of time-stamped events of different types, recording how an individual behaves in a given time period. The process data are too complex in terms of size and irregularity for the classical psychometric models to be directly applicable and, consequently, new ways for modeling and analysis are desired. We introduce herein a latent theme dictionary model for processes that identifies co-occurrent event patterns and individuals with similar behavioral patterns. Theoretical properties are established under certain regularity conditions for the likelihood-based estimation and inference. A nonparametric Bayes algorithm using the Markov Chain Monte Carlo method is proposed for computation. Simulation studies show that the proposed approach performs well in a range of situations. The proposed method is applied to an item in the 2012 Programme for International Student Assessment with interpretable findings.

MSC:

62P15 Applications of statistics to psychology
62D20 Causal inference from observational studies
62G07 Density estimation
62-08 Computational methods for problems pertaining to statistics
65C05 Monte Carlo methods
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aalen, O.; Borgan, O.; Gjessing, H., Survival and event history analysis: A process point of view (2008), Berlin: Springer, Berlin · Zbl 1204.62165
[2] Allison, PD, Event history analysis: Regression for longitudinal event data (1984), California: Sage, California
[3] Allman, E.; Matias, C.; Rhodes, J., Identifiablity of parameters in latent structure models with many observed variables, The Annals of Statistics, 37, 3099-3132 (2009) · Zbl 1191.62003 · doi:10.1214/09-AOS689
[4] Blei, DM; Ng, AY; Jordan, MI, Latent dirichlet allocation, Journal of Machine Learning research, 3, 993-1022 (2003) · Zbl 1112.68379
[5] Borboudakis, G.; Tsamardinos, I., Forward-backward selection with early dropping, The Journal of Machine Learning Research, 20, 276-314 (2019) · Zbl 1483.68279
[6] Chen, Y. (2019). A continuous-time dynamic choice measurement model for problem-solving process data. arXiv preprint arXiv:1912.11335.
[7] Chen, Y-L; Tang, K.; Shen, R-J; Hu, Y-H, Market basket analysis in a multiple store environment, Decision Support Systems, 40, 339-354 (2005) · doi:10.1016/j.dss.2004.04.009
[8] Deng, K.; Geng, Z.; Liu, JS, Association pattern discovery via theme dictionary models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 319-347 (2014) · Zbl 07555453 · doi:10.1111/rssb.12032
[9] Duchateau, L.; Janssen, P., The frailty model (2007), Berlin: Springer, Berlin
[10] Dunson, DB; Xing, C., Nonparametric Bayes modeling of multivariate categorical data, Journal of the American Statistical Association, 104, 1042-1051 (2009) · Zbl 1388.62151 · doi:10.1198/jasa.2009.tm08439
[11] Fang, G.; Liu, J.; Ying, Z., On the identifiability of diagnostic classification models, Psychometrika, 84, 19-40 (2019) · Zbl 1431.62530 · doi:10.1007/s11336-018-09658-x
[12] Gibson, WA, Three multivariate models: Factor analysis, latent structure analysis, and latent profile analysis, Psychometrika, 24, 229-252 (1959) · Zbl 0117.15001 · doi:10.1007/BF02289845
[13] Goodman, LA, Exploratory latent structure analysis using both identifiable and unidentifiable models, Biometrika, 61, 215-231 (1974) · Zbl 0281.62057 · doi:10.1093/biomet/61.2.215
[14] Goodman, M., Finnegan, R., Mohadjer, L., Krenzke, T., & Hogan, J. (2013). Literacy, numeracy, and problem solving in technology-rich environments among US adults: Results from the program for the international assessment of adult competencies 2012. First look (NCES 2014-008). ERIC.
[15] Griffin, P.; McGaw, B.; Care, E., Assessment and teaching of 21st century skills (2012), Berlin: Springer, Berlin
[16] Han, Z.; He, Q.; von Davier, M., Predictive feature generation and selection using process data from pisa interactive problem-solving items: An application of random forests, Frontiers in Psychology, 10, 2461 (2019) · doi:10.3389/fpsyg.2019.02461
[17] Hastie, T.; Tibshirani, R.; Friedman, J.; Franklin, J., The elements of statistical learning: Data mining, inference and prediction, The Mathematical Intelligencer, 27, 83-85 (2005)
[18] He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Handbook of research on technology tools for real-world skill development, (pp. 750-777). IGI Global.
[19] Ishwaran, H.; Rao, JS, Detecting differentially expressed genes in microarrays using Bayesian model selection, Journal of the American Statistical Association, 98, 438-455 (2003) · Zbl 1041.62090 · doi:10.1198/016214503000224
[20] Ishwaran, H.; Rao, JS, Spike and slab variable selection: Frequentist and bayesian strategies, The Annals of Statistics, 33, 730-773 (2005) · Zbl 1068.62079 · doi:10.1214/009053604000001147
[21] Kruskal, JB, Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics, Linear Algebra and its Applications, 18, 95-138 (1977) · Zbl 0364.15021 · doi:10.1016/0024-3795(77)90069-6
[22] Liu, J.; Xu, G.; Ying, Z., Data-driven learning of q-matrix, Applied Psychological Measurement, 36, 548-564 (2012) · doi:10.1177/0146621612456591
[23] Liu, J.; Xu, G.; Ying, Z., Theory of the self-learning q-matrix, Bernoulli: Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 19, 1790 (2013) · Zbl 1294.68118 · doi:10.3150/12-BEJ430
[24] Lord, FM, Applications of item response theory to practical testing problems (1980), UK: Routledge, UK
[25] OECD. (2014a). Assessing problem-solving skills in PISA 2012.
[26] OECD. (2014b). PISA 2012 technical report. (Available at) http://www.oecd.org/pisa/pisaproducts/pisa2012technicalreport.htm.
[27] OECD. (2016). PISA 2015 results in focus. (Available at) https://www.oecd.org/pisa/pisa-2015-results-in-focus.pdf.
[28] Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. Knowledge discovery in databases, 229-238.
[29] Qiao, X.; Jiao, H., Data mining techniques in analyzing process data: A didactic, Frontiers in Psychology, 9, 2231 (2018) · doi:10.3389/fpsyg.2018.02231
[30] Sethuraman, J., A constructive definition of dirichlet priors, Statistica Sinica, 4, 639-650 (1994) · Zbl 0823.62007
[31] Templin, J.; Henson, RA, Diagnostic measurement: Theory, methods, and applications (2010), New York: Guilford Press, New York
[32] Tibshirani, R., The lasso method for variable selection in the cox model, Statistics in Medicine, 16, 385-395 (1997) · doi:10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
[33] van der Linden, WJ, A lognormal model for response times on test items, Journal of Educational and Behavioral Statistics, 31, 181-204 (2006) · doi:10.3102/10769986031002181
[34] Vermunt, JK; Magidson, J., Latent class cluster analysis, Applied Latent Class Analysis, 11, 89-106 (2002) · doi:10.1017/CBO9780511499531.004
[35] Walker, SG, Sampling the dirichlet mixture model with slices, Communications in Statistics-Simulation and Computation®, 36, 45-54 (2007) · Zbl 1113.62058 · doi:10.1080/03610910601096262
[36] Xu, G., Identifiability of restricted latent class models with binary responses, The Annals of Statistics, 45, 675-707 (2017) · Zbl 1371.62010 · doi:10.1214/16-AOS1464
[37] Xu, H.; Fang, G.; Chen, Y.; Liu, J.; Ying, Z., Latent class analysis of recurrent events in problem-solving items, Applied Psychological Measurement, 42, 478 (2018) · doi:10.1177/0146621617748325
[38] Xu, H., Fang, G., & Ying, Z. (2019). A latent topic model with Markovian transition for process data. arXiv preprint arXiv:1911.01583.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.