The discovery of copy number variation in healthy individuals is far from complete, and due to the resolution of detection systems used, the majority of loci reported so far are relatively large (~65% > 10kb). Applying a two-stage high-resolution array CGH approach to analyse 50 healthy Caucasian males from northern France, we discovered 2208 copy number variants (CNVs) detected by more than one consecutive probe. These clustered into 1469 copy number variant regions (CNVRs), of which 721 are thought to be novel. The majority of these are small (median size 4.4kb) and most have common boundaries, with a coefficient of variation less than 0.1 for 83% of end-points in those observed in multiple samples. Only 6% of the CNVRs analysed showed evidence of both copy number losses and gains at the same site. A further 6089 variants were detected by single probes: 48% of these were observed in more than one individual. In total, 2570 genes were seen to intersect variants: 1284 in novel loci. Genes involved in differentiation and development were significantly overrepresented, and approximately half the genes identified feature in the OMIM database. The biological importance of many of the genes affected, along with the well-conserved nature of the majority of the copy number variants, suggests they could have important implications for phenotype and, thus, be useful for association studies of complex diseases. Keywords: comparative genomic hybridization
Overall design
DNA samples were isolated from peripheral blood of 50 unrelated, apparently healthy white males of northern French origin using Puregene kits (Gentra, USA) and resuspended in Tris -EDTA buffer. For the reference, a particular sample chosen was a north American female of unknown ethnic origin (NA15510), obtained from the Coriell Cell Repository, which has been extensively characterised and has been recommended for use in CNV detection programmes to allow meaningful comparison of data between studies (discussed in Scherer, et al. 2007, Nature Genetics Supplement 39: S7-S15). Each of the 50 samples was hybridized once, except for one sample 49 for which 4 additional replicates were done. These replicates were done in the reverse polarity, i.e. sample 49 was labeled with Cy3 and NA15510 reference was labeled with Cy5. Data set has 3 additional control self-self experiments for NA15510 sample. These experiments were used to estimate false positive rate.