GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM4046326

Query DataSets for GSM4046326

Status

Public on Jan 13, 2021

Title

rnaseq_SALL4KD_repl1

Sample type

SRA

Source name

SNU398_rnaseq_SALL4KD

Organism

Homo sapiens

Characteristics

cell line: SNU398
cell type: Hepatocellular Carcinoma cell line
genotype/variation: SALL4KD

Treatment protocol

SNU398 cells were transduced with shRNA-2 (Supplementary Table S3) and RNA was extracted by Trizol 72 hours after transduction

Growth protocol

SNU398 cells were grown in RPMI media with 10% FBS

Extracted molecule

total RNA

Extraction protocol

Pair-end (75bp) Illumina sequencing was performed on the barcoded and amplified libraries

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina NextSeq 500

Description

processed data file: rnaseq_featureCounts_exp_matrix.txt, 3scr_3shSALL4_gene_exp.txt

Data processing

Cut&Run - Cut and Run Analysis Pipeline (CnRAP) scripts can be found on github (https://github.com/mbassalbioinformatics/CnRAP). Raw fastq files were trimmed with Trimmomatic (REF, v0.36 tested) in paired end mode with the flags “ILLUMINACLIP: <adapter_path> Truseq3.PE.fa:2:15:4:4:true LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:25”. Next, the kseq trimmer developed by the Orkin lab was run on each fastq file, there were no flags to modify. For alignment, BWA (v0.7.17-r1188 tested) was first run in “aln” mode on a masked hg38 genome downloaded from UCSC to create *.sai files. Next, BWA was run in “sampe” mode with the flag “-n 20” on the *.sai files. Afterwards, stampy (v1.0.32 tested) was run in “--sensitive" mode. For mapping statistics, bamtools (v2.5.1 tested) “stat” was used. Post alignment, unmapped reads were removed from bam files using bamtools with the flags “filter -isMapped true.” Next, using samtools (v1.5 tested), bam files were sorted (“sort -l 9 -O bam”), had read pair mates fixed (“fixmate”) and indexed (“index”). Bam coverage maps were generated using bamCoverage from the deeptools suite (v2.5.7 tested). The same procedure was run to align fastq files to a masked genome Saccharomyces Cerevisiae v3 (sacCer3), also downloaded from UCSC. In preparation for peak calling, a normalization factor was determined for each hg38 aligned replicate based on the corresponding number of proper-pairs aligned to the sacCer3 genome, as recommended in the Henikoff pipeline. This was calculated as follows normalization_factor=10,000,000\(#proper_pairs/2). The number of ‘proper-pairs’ was extracted from the bamtools mapping statistics calculated previously. Next, from the hg38 aligned bams, ‘proper-paired’ reads were extracted using samtools with the flags “view -b -f 2 -F 524” with the output piped into bedtools with the flags “genomecov -bg -scale <normalization_factor> -ibam stdin.” This produced bed files of ‘proper-paired’ reads that have been normalized to the number of reads aligned back to the sacCer3 genome. Bedgraphs of these normalized bed files were generated as intermediary files to facilitate generation of bigwig coverage maps using the bedGraphToBigWig from UCSC (v4). For peak calling, the recently developed SEACR (v1.1 tested) was utilized and run in both “stringent” and “relaxed” mode to produce peak files with the flag “non” as the bed files were already normalized to the number of yeast spike-in reads. Subsequently peak file columns were re-arranged to facilitate motif discovery using both HOMER (v4.10 tested, flags “-size given,50,100,200 -mask -p 20 -S 50”) and MEME (v5.0.5 tested, “-dreme-m 50 -meme-nmotifs 50”). Peaks were annotated using the R package ChIPSeeker (v1.20.0 tested). Overlapping peak subsets were generated using mergePeaks.py from the HOMER suite with the flag “-d 1000”.
NicE-seq - Raw fastq reads were trimmed and mapped in the same way as described above for CUT&RUN data. Peaks were called with Model-based Analysis of ChIP-seq (MACS) (v2.1.1.20160309) for scrambled control sample and SALL4 KD sample separately. Unique peaks to each sample was identified with HOMER mergePeaks function. bamCoverage (described above for CUT&RUN) was used to generate BigWig files for visualization.
RNA-Seq - The raw fastq reads were trimmed using TrimGalore (v.0.4.5) and mapped to hg38 genome downloaded from UCSC using STAR. The read counts table was generated with the featureCounts. Differential gene expression analysis was performed to compare scrambled control samples with SALL4 KD samples using DESeq2 and the fold change was plotted as a volcano plot using ggplot2.
Genome_build: (human) hg38, (yeast) saccer3
Supplementary_files_format_and_content: peak files; gene x features expression matrix tab delimited text

Submission date

Aug 26, 2019

Last update date

Jan 13, 2021

Contact name

Mahmoud Adel Bassal

E-mail(s)

[email protected]

Organization name

Beth Israel Deaconess Medical Center

Department

Hematology and Oncology

Lab

Tenen Lab

Street address

3 Blackfan Circle

City

Boston

State/province

Massachusetts

ZIP/Postal code

02131-4834

Country

USA

Platform ID

GPL18573

Series (1)

GSE136332

Zinc finger protein SALL4 functions through an AT-rich motif to regulate gene expression

Relations

BioSample

SAMN12637065

SRA

SRX6759435

Supplementary file	Size	Download	File type/resource
GSM4046326_rnaseq_SALL4KD_repl1_rawCounts.txt.gz	297.4 Kb	(ftp)(http)	TXT
SRA Run Selector
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record