|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Oct 26, 2021 |
Title |
PRDM9 Amplicon Sequencing Nanopore |
Sample type |
SRA |
|
|
Source name |
individuals from the 1000 genomes project
|
Organism |
Homo sapiens |
Characteristics |
individuals: mixed individuals (see File_S1_PRDM9_genotypes.txt) sequencing technique: Oxford Nanopore sequencing instrument: Oxford Nanopore FLO-MIN107
|
Extracted molecule |
genomic DNA |
Extraction protocol |
LongAmp Taq 2X Master Mix (M0287) from New England Biolabs Inc. was used for PCR amplification. Post-amplification, samples were individually tested for successful amplification and low presence of polymerase slippage (presence of DNA laddering/smearing) by running on agarose gel electrophoresis. Samples were re-amplified if there was extensive DNA laddering/smearing by visualization. Based on the Genome Reference Consortium Human Build 38 (PRDM9 allele with 13 ZFs): the final amplified product was 1,899bp which contained the 1,092bp C2H2 ZF array, 670bp of upstream flanking sequence, and 137bp of downstream flanking sequence to the PRDM9 ZF array. Of note, the total length of the final amplified product varied based on the number of ZFs present in the PRDM9 allele. Samples were then pooled and prepared for multiplexing. We performed dual-barcoding in order to sequence all 758 multiplexed individuals. The first round of barcoding was done by adding unique DNA barcode sequences to the 5’ end of the PCR primers, totaling eight primer pairs. After amplification and the addition of the first barcode, samples were pooled in groups of eight (each sample tagged with a separate barcode sequence) and subjected to a second round of barcoding. The second round of multiplexing was performed using the PCR Barcoding Expansion 1-96 kit (EXP-PBC096) from Oxford Nanopore Technologies (ONT), Inc. following the protocol detailed on their website (https://nanoporetech.com - PCR barcoding (96) amplicons). Sequencing libraries for Nanopore were prepared using either the 1D (SQK-LSK308) or the 1D2 sequencing kit (SQK-LSK309) from Oxford Nanopore Technologies and were run on a MinION sequencer with R9.5.1 flow cells (FLO-MIN107).
|
|
|
Library strategy |
OTHER |
Library source |
genomic |
Library selection |
other |
Instrument model |
MinION |
|
|
Description |
PRDM9 amplicon sequencing using Oxford Nanopore sequencing File_S1_PRDM9_genotypes.txt
|
Data processing |
To identify sequencing reads derived from each individual, we performed read demultiplexing using Guppy v3.1.5. This first involved base calling (with standard parameters), followed by two rounds of demultiplexing to identify the outer and inner barcodes. The first round of demultiplexing identified the outer barcode as follows: guppy_barcoder --compress_fastq -i {guppy output} -s demux --arrangements_files barcode_arrs_pcr96.cfg --min_score 50 --front_window_size 300 --rear_window_size 300 --trim_barcodes The second round of demultiplexing was then performed on each of the files generated from the first round: guppy_barcoder --compress_fastq -i {round 1 barcoding FAST5} --arrangements_files custom_12bp.cfg --min_score 70 --front_window_size 100 --rear_window_size 100 --trim_barcodes We used the Oxford Nanopore development basecaller Bonito (v.0.2.3) for base calling as it is more accurate than Guppy, the production basecaller (Silvestre-Ryan and Holmes 2020). Specifically, we found that the Guppy base calling accuracy for CpG dinucleotides in particular contexts was insufficient to confidently infer PRDM9 genotypes using our methods (not shown). Reads from each individual were grouped and base called separately using Bonito (v.0.2.3) and default parameters. ** NOTE : FASTQ files have been artificially created from the FASTA output of hte Bonito basecaller. The FASTQ quality scores are meaningless placeholder text and should not be used ** PRDM9 alleles were genotyped from basecalled FASTA files using custom scripts (see Alleva et al. 2021; see also https://github.com/kevbrick/genotype_prdm9_LR) To generate per-individual FASTA files from the non-demultiplexed pool FASTQ, use the following code at the teminal: zcat bonito1d.fastq.gz |perl -lane 'if ($_ =~ /^@((\S+?)\|(\S+?)\|(\S+?)\|.+)/){$id = $1; $name="$2.$3_$4.fasta"; open OUT, ">>", $name ; print OUT ">$id"; next}; if ($name){print OUT $_; $name=""; close OUT}' Genome_build: hg38 Supplementary_files_format_and_content: File_S1_PRDM9_genotypes.txt contains PRDM9 genotyping information for all individuals
|
|
|
Submission date |
Jul 20, 2021 |
Last update date |
Oct 26, 2021 |
Contact name |
Kevin Brick |
E-mail(s) |
[email protected], [email protected], [email protected]
|
Organization name |
NIDDK
|
Department |
GBB
|
Street address |
5/205 Memorial Drive
|
City |
Bethesda |
State/province |
MD |
ZIP/Postal code |
20892 |
Country |
USA |
|
|
Platform ID |
GPL24106 |
Series (1) |
GSE166483 |
Cataloging human PRDM9 variability utilizing long-read sequencing technologies reveals PRDM9 population-specificity and two distinct groupings of related alleles |
|
Relations |
BioSample |
SAMN20332787 |
SRA |
SRX11510430 |
Supplementary data files not provided |
SRA Run Selector |
Raw data are available in SRA |
Processed data are available on Series record |
|
|
|
|
|