GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM5464167

Query DataSets for GSM5464167

Status

Public on Oct 26, 2021

Title

PRDM9 Amplicon Sequencing Nanopore

Sample type

SRA

Source name

individuals from the 1000 genomes project

Organism

Homo sapiens

Characteristics

individuals: mixed individuals (see File_S1_PRDM9_genotypes.txt)
sequencing technique: Oxford Nanopore
sequencing instrument: Oxford Nanopore FLO-MIN107

Extracted molecule

genomic DNA

Extraction protocol

LongAmp Taq 2X Master Mix (M0287) from New England Biolabs Inc. was used for PCR amplification. Post-amplification, samples were individually tested for successful amplification and low presence of polymerase slippage (presence of DNA laddering/smearing) by running on agarose gel electrophoresis. Samples were re-amplified if there was extensive DNA laddering/smearing by visualization. Based on the Genome Reference Consortium Human Build 38 (PRDM9 allele with 13 ZFs): the final amplified product was 1,899bp which contained the 1,092bp C2H2 ZF array, 670bp of upstream flanking sequence, and 137bp of downstream flanking sequence to the PRDM9 ZF array. Of note, the total length of the final amplified product varied based on the number of ZFs present in the PRDM9 allele. Samples were then pooled and prepared for multiplexing. We performed dual-barcoding in order to sequence all 758 multiplexed individuals. The first round of barcoding was done by adding unique DNA barcode sequences to the 5’ end of the PCR primers, totaling eight primer pairs. After amplification and the addition of the first barcode, samples were pooled in groups of eight (each sample tagged with a separate barcode sequence) and subjected to a second round of barcoding. The second round of multiplexing was performed using the PCR Barcoding Expansion 1-96 kit (EXP-PBC096) from Oxford Nanopore Technologies (ONT), Inc. following the protocol detailed on their website (https://nanoporetech.com - PCR barcoding (96) amplicons).
Sequencing libraries for Nanopore were prepared using either the 1D (SQK-LSK308) or the 1D2 sequencing kit (SQK-LSK309) from Oxford Nanopore Technologies and were run on a MinION sequencer with R9.5.1 flow cells (FLO-MIN107).

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

MinION

Description

PRDM9 amplicon sequencing using Oxford Nanopore sequencing
File_S1_PRDM9_genotypes.txt

Data processing

To identify sequencing reads derived from each individual, we performed read demultiplexing using Guppy v3.1.5. This first involved base calling (with standard parameters), followed by two rounds of demultiplexing to identify the outer and inner barcodes. The first round of demultiplexing identified the outer barcode as follows: guppy_barcoder --compress_fastq -i {guppy output} -s demux --arrangements_files barcode_arrs_pcr96.cfg --min_score 50 --front_window_size 300 --rear_window_size 300 --trim_barcodes The second round of demultiplexing was then performed on each of the files generated from the first round: guppy_barcoder --compress_fastq -i {round 1 barcoding FAST5} --arrangements_files custom_12bp.cfg --min_score 70 --front_window_size 100 --rear_window_size 100 --trim_barcodes We used the Oxford Nanopore development basecaller Bonito (v.0.2.3) for base calling as it is more accurate than Guppy, the production basecaller (Silvestre-Ryan and Holmes 2020). Specifically, we found that the Guppy base calling accuracy for CpG dinucleotides in particular contexts was insufficient to confidently infer PRDM9 genotypes using our methods (not shown). Reads from each individual were grouped and base called separately using Bonito (v.0.2.3) and default parameters. ** NOTE : FASTQ files have been artificially created from the FASTA output of hte Bonito basecaller. The FASTQ quality scores are meaningless placeholder text and should not be used **
PRDM9 alleles were genotyped from basecalled FASTA files using custom scripts (see Alleva et al. 2021; see also https://github.com/kevbrick/genotype_prdm9_LR)
To generate per-individual FASTA files from the non-demultiplexed pool FASTQ, use the following code at the teminal: zcat bonito1d.fastq.gz |perl -lane 'if ($_ =~ /^@((\S+?)\|(\S+?)\|(\S+?)\|.+)/){$id = $1; $name="$2.$3_$4.fasta"; open OUT, ">>", $name ; print OUT ">$id"; next}; if ($name){print OUT $_; $name=""; close OUT}'
Genome_build: hg38
Supplementary_files_format_and_content: File_S1_PRDM9_genotypes.txt contains PRDM9 genotyping information for all individuals

Submission date

Jul 20, 2021

Last update date

Oct 26, 2021

Contact name

Kevin Brick

E-mail(s)

[email protected], [email protected], [email protected]

Organization name

NIDDK

Department

GBB

Street address

5/205 Memorial Drive

City

Bethesda

State/province

ZIP/Postal code

20892

Country

USA

Platform ID

GPL24106

Series (1)

GSE166483

Cataloging human PRDM9 variability utilizing long-read sequencing technologies reveals PRDM9 population-specificity and two distinct groupings of related alleles

Relations

BioSample

SAMN20332787

SRA

SRX11510430

Supplementary data files not provided

SRA Run Selector

Raw data are available in SRA

Processed data are available on Series record