×

CluStrat

swMATH ID: 43149
Software Authors: Bose, A., Burch, M.C., Chowdhury, A., Paschou, P., Drineas, P.
Description: CluStrat: a structure informed clustering strategy for population stratification. Genome-wide association studies (GWAS) have been extensively used to estimate the signed effects of trait-associated alleles. Recent independent studies failed to replicate the strong evidence of selection for height across Europe implying the shortcomings of standard population stratification correction approaches. Here, we present CluStrat, a stratification correction algorithm for complex population structure that leverages the linkage disequilibrium (LD)-induced distances between individuals. CluStrat performs agglomerative hierarchical clustering using the Mahalanobis distance and then applies sketching-based randomized ridge regression on the genotype data to obtain the association statistics. With the growing size of data, computing and storing the genome wide covariance matrix is a non-trivial task. We get around this overhead by computing the GRM directly using a connection between statistical leverage scores and the Mahalanobis distance. We test CluStrat on a large simulation study of discrete and admixed, arbitrarily-structured sub-populations identifying two to three-fold more true causal variants when compared to Principal Component (PC) based stratification correction methods while trading off for a slightly higher spurious associations. Applying CluStrat on WTCCC2 Parkinson’s disease (PD) data, we identified loci mapped to a host of genes associated with PD such as BACH2, MAP2, NR4A2, SLC11A1, UNC5C to name a few. Availability and Implementation CluStrat source code and user manual is available at: https://github.com/aritra90/CluStrat
Homepage: https://www.biorxiv.org/content/10.1101/2020.01.15.908228v1.full.pdf
Source Code:  https://github.com/aritra90/CluStrat
Dependencies: Python
Related Software: Eigenstrat; GWAS Catalog; STRUCTURE; ThreSPCA; TeraPCA; clusterProfiler
Cited in: 1 Document

Cited in 0 Serials

Citations by Year