GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM3273742

Query DataSets for GSM3273742

Status

Public on Jul 18, 2018

Title

DRX027361

Sample type

SRA

Source name

Oregon-R_LD_2

Organism

Drosophila melanogaster

Characteristics

Sex: female
developmental stage: adult
tissue: whole body
attributes: m2|oregon

Extracted molecule

total RNA

Extraction protocol

see original sample

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina HiSeq 2500

Data processing

We created a pre-alignment pipeline to identify technical metadata and generate sample quality metrics. We downloaded FASTQs from SRA using fastq-dump (sra-tools v2.8.2) --split-files -M 0, and counted the number of reads and estimated average read lengths. A sample was considered paired end if two files were generated by fastq-dump and each file had an equal number of reads, ≥ 10,000 reads, and an average read length ≥ 10 bp. We filtered individual reads that were ≤ 25 bp using atropos (v1.1.18) with --minimum-length 25. We simultaneously verified samples were indeed Drosophila and estimated contamination with FastQ Screen (v0.11.3) and bowtie 2 (v2.3.3.1); by mapping 100,000 reads to 8 references (dm6, rRNA, wolbachia, human, yeast, e. coli, PhiX, ERCC-SRM2374). Next we aligned all reads with Hisat2 (v2.1.0) with --max_intronlen 300000 and --known-splicesite-file to the Drosophila melanogaster Release 6 plus ISO1 MT (GCA_000001215.4). This was followed with samtools (v1.7) and bamtools (v2.4.1) with default settings to generate summary statistics. We estimated various metrics with Picard CollectRNASeqMetrics (v2.15.0) using three separate parameters STRAND=NONE, STRAND=FIRST_READ_TRANSCRIPTION_STRAND, and STRAND=SECOND_READ_TRANSCRIPTION_STRAND. These metrics allowed us estimate library strandedness. Finally we identified duplicates using Picard MarkDuplicates (v2.15.0).
To generate counts tables and coverage tracks we used parameters discovered in the pre-alignment pipeline in our alignment pipeline. The alignment pipeline uses FASTQ file(s) downloaded by the pre-alignment pipeline, but trimms adapter sequence and low quality bases using atropos (v1.1.18) with -q 20 --minimum-length 25. The remaining reads were mapped using Hisat2 (v2.1.0) with --dta --max-intronlen 300000 --known-splicesite-infile and the --rna-strandedness using ‘F’, ‘R’, ‘FR’, or ‘RF’ depending on the strandedness. We merged alignments from individual SRA runs (SRRs) to the library level (SRX) and generated gene level, junction level, and intergenic coverage counts using FeatureCounts from the subread package (v1.5.3). Finally we created browser tracks using bamCoverage from the deeptools package (v2.5.4) using --binSize 1 --normalizeTo1x 129000000 --ignoreForNormalization chrX.
Genome_build: Drosophila melanogaster Release 6 plus ISO1 MT (GeneBank assembly accession: GCA_000001215.4)
Supplementary_files_format_and_content: Processed data files include:
*.bw are BigWig files generated using deeptools bamCoverage
*.counts are gene level coverage counts
*.jcounts are gene level junction counts
*.intergenic.counts are intergenic coverage counts
*.intergenic.jcounts are intergenic junction counts
Series level supplementary files:
dmel_r6-11.intergenic.gtf intergenic GTF generated by the pipeline for estimating intergenic coverage counts.
supplemental_metadata.tsv supplemental metadata file containing additional metadata for each sample including QC values and various flags generated by each pipeline
gene_counts.tsv supplemental file containing all gene counts as a single matrix
intergenic_counts.tsv supplemental file containing all intergenic counts as a single matrix

Submission date

Jul 17, 2018

Last update date

Sep 04, 2018

Contact name

Brian Oliver

E-mail(s)

[email protected]

Phone

301-204-9463

Organization name

NIDDK, NIH

Department

LBG

Lab

Developmental Genomics

Street address

50 South Drive

City

Bethesda

State/province

ZIP/Postal code

20892

Country

USA

Platform ID

GPL17275

Series (1)

GSE117217

Remapping the SRA: Drosophila melanogaster RNA-Seq data from the Sequence Read Archive

Relations

BioSample

SAMD00025844

SRA

DRX027361

Named Annotation

GSM3273742_DRX027361.flybase.plus.bw

Named Annotation

GSM3273742_DRX027361.flybase.minus.bw

Supplementary file	Size	Download	File type/resource
GSM3273742_DRX027361.bam.counts.jcounts.txt.gz	335.8 Kb	(ftp)(http)	TXT
GSM3273742_DRX027361.bam.counts.txt.gz	840.6 Kb	(ftp)(http)	TXT
GSM3273742_DRX027361.bam.intergenic.counts.jcounts.txt.gz	302.7 Kb	(ftp)(http)	TXT
GSM3273742_DRX027361.bam.intergenic.counts.txt.gz	146.8 Kb	(ftp)(http)	TXT
GSM3273742_DRX027361.flybase.minus.bw	15.5 Mb	(ftp)(http)	BW
GSM3273742_DRX027361.flybase.plus.bw	14.8 Mb	(ftp)(http)	BW
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record