PLASQ: A generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data.

*(English)*Zbl 1144.62098Summary: Human cancer is largely driven by the acquisition of mutations. One class of such mutations is copy number polymorphisms, comprised of deviations from the normal diploid two copies of each autosomal chromosome per cell. We describe a probe-level allele-specific quantitation (PLASQ) procedure to determine copy number contributions from each of the parental chromosomes in cancer cells from single-nucleotide polymorphism (SNP) microarray data. Our approach is based upon a generalized linear model that takes advantage of a novel classification of probes on the array. As a result of this classification, we are able to fit the model to the data using an expectation-maximization algorithm designed for the purpose. We demonstrate a strong model fit to data from a variety of cell types. In normal diploid samples, PLASQ is able to genotype with very high accuracy. Moreover, we are able to provide a generalized genotype in cancer samples (e.g., CCCCT at an amplified SNP). Our approach is illustrated on a variety of lung cancer cell lines and tumors, and a number of events are validated by independent computational and experimental means. An R software package containing the methods is freely available.

##### MSC:

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

92C50 | Medical applications (general) |

62J12 | Generalized linear models (logistic models) |

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

65C60 | Computational problems in statistics (MSC2010) |