|
|
| Deep coverage exome capture sequencing supports reliable discovery |
| of rare variants, including those which appear only once or twice across |
| all individuals sequenced. |
| Exome capture sequencing was performed for 1128 individuals in |
| HapMap and 1000 Genomes population samples by four sequencing |
| centers, Beijing Genome Institute, Baylor College of Medicine |
| Human Genome Sequencing Center, Broad Institute and Washington |
| University Genome Sequencing Center, using either the NimbleGen |
| SeqCap_EZ_Exome_v2 (BGI, BCM) or Agilent SureSelect_All_Exon_V2 |
| (BI, WUGSC) exome capture reagents. All sequencing was done from |
| lymphoblastoid cell line DNA. Sequencing was considered complete |
| when at least 70% of the target region showed 20x or greater depth |
| of coverage in mapped reads. Target regions for analysis are the |
| intersection of the two capture reagent target regions with CCDS |
| coding exons, plus 50 bp flanking regions. This totals approximately |
| 47 Mb, of which 62% are coding exons. Exact boundaries in GRCh37 |
| sequence coordinates are shown in file: |
| /ftp-trace.ncbi.nlm.nih.gov/1000genomes/ftp/technical/reference/ |
| exome_pull_down_targets/20110426_exome_add50bp.consensus.bed |
| Illumina sequence reads were mapped at Broad Institute using BWA |
| and at Boston College using Mosaik. AB SOLiD sequence reads were |
| mapped at Baylor using Bfast and at Boston College using Mosaik. |
| All read mapping uses the GRCh version 37 human genome reference |
| sequence, without additional decoy sequences. The final site list is the |
| union of calls from both sequencing technologies, separately filtered |
| using SVM. |
| While formatting the data for dbSNP submission, I observe that the |
| single ALT allele shown in the 1000 Genomes Phase 1 integrated |
| genotypes sometimes differs from that found in the contributing call |
| sets. For consistency, the allele from the integrated genotypes is |
| shown here. |
| Data availability: |
| The mapped sequence reads are indexed in file: |
| /ftp-trace.ncbi.nlm.nih.gov/1000genomes/ftp/alignment_indices/ |
| 20110521.exome.alignment.index |
| The original SNP calls from individual centers, the filtered union |
| site lists and integrated genotypes are currently found in directories: |
| /ftp-trace.ncbi.nlm.nih.gov/1000genomes/ftp/technical/working/ |
| 20110721_exome_call_sets |
| 20110810_exome_consensus_snps |
| 20120117_new_phase1_intgrated_genotypes (sic) |