Nonparametric Bayes modeling of multivariate categorical data. (English) Zbl 1388.62151

Summary: Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables. This support condition ensures that we are not restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation. Methods for nonparametric testing of violations of independence are proposed, and the methods are applied to model positional dependence within transcription factor binding motifs.


62H05 Characterization and structure theory for multivariate probability distributions; copulas
62G05 Nonparametric estimation
62F15 Bayesian inference
62G10 Nonparametric hypothesis testing


Full Text: DOI Link