NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM5464167 Query DataSets for GSM5464167
Status Public on Oct 26, 2021
Title PRDM9 Amplicon Sequencing Nanopore
Sample type SRA
 
Source name individuals from the 1000 genomes project
Organism Homo sapiens
Characteristics individuals: mixed individuals (see File_S1_PRDM9_genotypes.txt)
sequencing technique: Oxford Nanopore
sequencing instrument: Oxford Nanopore FLO-MIN107
Extracted molecule genomic DNA
Extraction protocol LongAmp Taq 2X Master Mix (M0287) from New England Biolabs Inc. was used for PCR amplification. Post-amplification, samples were individually tested for successful amplification and low presence of polymerase slippage (presence of DNA laddering/smearing) by running on agarose gel electrophoresis. Samples were re-amplified if there was extensive DNA laddering/smearing by visualization. Based on the Genome Reference Consortium Human Build 38 (PRDM9 allele with 13 ZFs): the final amplified product was 1,899bp which contained the 1,092bp C2H2 ZF array, 670bp of upstream flanking sequence, and 137bp of downstream flanking sequence to the PRDM9 ZF array. Of note, the total length of the final amplified product varied based on the number of ZFs present in the PRDM9 allele. Samples were then pooled and prepared for multiplexing. We performed dual-barcoding in order to sequence all 758 multiplexed individuals. The first round of barcoding was done by adding unique DNA barcode sequences to the 5’ end of the PCR primers, totaling eight primer pairs. After amplification and the addition of the first barcode, samples were pooled in groups of eight (each sample tagged with a separate barcode sequence) and subjected to a second round of barcoding. The second round of multiplexing was performed using the PCR Barcoding Expansion 1-96 kit (EXP-PBC096) from Oxford Nanopore Technologies (ONT), Inc. following the protocol detailed on their website (https://nanoporetech.com - PCR barcoding (96) amplicons).
Sequencing libraries for Nanopore were prepared using either the 1D (SQK-LSK308) or the 1D2 sequencing kit (SQK-LSK309) from Oxford Nanopore Technologies and were run on a MinION sequencer with R9.5.1 flow cells (FLO-MIN107).
 
Library strategy OTHER
Library source genomic
Library selection other
Instrument model MinION
 
Description PRDM9 amplicon sequencing using Oxford Nanopore sequencing
File_S1_PRDM9_genotypes.txt
Data processing To identify sequencing reads derived from each individual, we performed read demultiplexing using Guppy v3.1.5. This first involved base calling (with standard parameters), followed by two rounds of demultiplexing to identify the outer and inner barcodes. The first round of demultiplexing identified the outer barcode as follows: guppy_barcoder --compress_fastq -i {guppy output} -s demux --arrangements_files barcode_arrs_pcr96.cfg --min_score 50 --front_window_size 300 --rear_window_size 300 --trim_barcodes The second round of demultiplexing was then performed on each of the files generated from the first round: guppy_barcoder --compress_fastq -i {round 1 barcoding FAST5} --arrangements_files custom_12bp.cfg --min_score 70 --front_window_size 100 --rear_window_size 100 --trim_barcodes We used the Oxford Nanopore development basecaller Bonito (v.0.2.3) for base calling as it is more accurate than Guppy, the production basecaller (Silvestre-Ryan and Holmes 2020). Specifically, we found that the Guppy base calling accuracy for CpG dinucleotides in particular contexts was insufficient to confidently infer PRDM9 genotypes using our methods (not shown). Reads from each individual were grouped and base called separately using Bonito (v.0.2.3) and default parameters. ** NOTE : FASTQ files have been artificially created from the FASTA output of hte Bonito basecaller. The FASTQ quality scores are meaningless placeholder text and should not be used **
PRDM9 alleles were genotyped from basecalled FASTA files using custom scripts (see Alleva et al. 2021; see also https://github.com/kevbrick/genotype_prdm9_LR)
To generate per-individual FASTA files from the non-demultiplexed pool FASTQ, use the following code at the teminal: zcat bonito1d.fastq.gz |perl -lane 'if ($_ =~ /^@((\S+?)\|(\S+?)\|(\S+?)\|.+)/){$id = $1; $name="$2.$3_$4.fasta"; open OUT, ">>", $name ; print OUT ">$id"; next}; if ($name){print OUT $_; $name=""; close OUT}'
Genome_build: hg38
Supplementary_files_format_and_content: File_S1_PRDM9_genotypes.txt contains PRDM9 genotyping information for all individuals
 
Submission date Jul 20, 2021
Last update date Oct 26, 2021
Contact name Kevin Brick
E-mail(s) [email protected], [email protected], [email protected]
Organization name NIDDK
Department GBB
Street address 5/205 Memorial Drive
City Bethesda
State/province MD
ZIP/Postal code 20892
Country USA
 
Platform ID GPL24106
Series (1)
GSE166483 Cataloging human PRDM9 variability utilizing long-read sequencing technologies reveals PRDM9 population-specificity and two distinct groupings of related alleles
Relations
BioSample SAMN20332787
SRA SRX11510430

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap