Analysis of pooled DNA samples on high density arrays without prior knowledge of differential hybridization rates
Genome-wide association studies are now possible but expensive. Genome-wide scans of suitable size (hundreds of cases/controls, hundreds of thousands of markers) currently cost well over US$1 million. DNA pooling is one approach which has the potential to reduce this financial burden. For genome-wide scale analysis, DNA pooling can be conducted using microarrays. However, naïve analysis of such data leads to grossly inflated type I error rates. We describe a structured analysis method for pooled data using internal replication information in large scale genotyping sets. The method takes advantage of information from SNPs typed in parallel on a high density array to construct a test statistic with desirable statistical properties. We utilize a general linear mixed model to appropriately account for the structured multiple measurements available with array data. The method does not require the use of additional arrays for the estimation of unequal hybridization rates. This means that tests for differences between cases and controls can be conducted with very few arrays and our method scales readily to accommodate arrays with several hundred thousand SNPs. We demonstrate the method on 384 Endometriosis cases and controls, typed using Affymetrix Genechip HindIII 50K arrays. For a subset of this data there were accurate measures of hybridisation rates available. Assuming equal hybridisation rates is shown to have a negligible effect upon the results. With a total of only 6 arrays, the method extracted 1/3 of the information (in terms of equivalent sample size) available with individual genotyping (requiring 768 arrays). With 20 arrays (10 for cases, 10 for controls), over half of the information could be extracted from this sample; this represents a ~20 fold reduction in cost compared with individual genotyping.