NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM4046330 Query DataSets for GSM4046330
Status Public on Jan 13, 2021
Title rna_seq_scramble_repl2
Sample type SRA
 
Source name SNU398_rna_seq_scramble
Organism Homo sapiens
Characteristics cell line: SNU398
cell type: Hepatocellular Carcinoma cell line
genotype/variation: scrambled shRNA
Treatment protocol SNU398 cells were transduced with scrambled shRNA and RNA was extracted by Trizol 72hours after transduction
Growth protocol SNU398 cells were grown in RPMI media with 10% FBS
Extracted molecule total RNA
Extraction protocol Pair-end (75bp) Illumina sequencing was performed on the barcoded and amplified libraries
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina NextSeq 500
 
Description processed data file: rnaseq_featureCounts_exp_matrix.txt, 3scr_3shSALL4_gene_exp.txt
Data processing Cut&Run - Cut and Run Analysis Pipeline (CnRAP) scripts can be found on github (https://github.com/mbassalbioinformatics/CnRAP). Raw fastq files were trimmed with Trimmomatic (REF, v0.36 tested) in paired end mode with the flags “ILLUMINACLIP: <adapter_path> Truseq3.PE.fa:2:15:4:4:true LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:25”. Next, the kseq trimmer developed by the Orkin lab was run on each fastq file, there were no flags to modify. For alignment, BWA (v0.7.17-r1188 tested) was first run in “aln” mode on a masked hg38 genome downloaded from UCSC to create *.sai files. Next, BWA was run in “sampe” mode with the flag “-n 20” on the *.sai files. Afterwards, stampy (v1.0.32 tested) was run in “--sensitive" mode. For mapping statistics, bamtools (v2.5.1 tested) “stat” was used. Post alignment, unmapped reads were removed from bam files using bamtools with the flags “filter -isMapped true.” Next, using samtools (v1.5 tested), bam files were sorted (“sort -l 9 -O bam”), had read pair mates fixed (“fixmate”) and indexed (“index”). Bam coverage maps were generated using bamCoverage from the deeptools suite (v2.5.7 tested). The same procedure was run to align fastq files to a masked genome Saccharomyces Cerevisiae v3 (sacCer3), also downloaded from UCSC. In preparation for peak calling, a normalization factor was determined for each hg38 aligned replicate based on the corresponding number of proper-pairs aligned to the sacCer3 genome, as recommended in the Henikoff pipeline. This was calculated as follows normalization_factor=10,000,000\(#proper_pairs/2). The number of ‘proper-pairs’ was extracted from the bamtools mapping statistics calculated previously. Next, from the hg38 aligned bams, ‘proper-paired’ reads were extracted using samtools with the flags “view -b -f 2 -F 524” with the output piped into bedtools with the flags “genomecov -bg -scale <normalization_factor> -ibam stdin.” This produced bed files of ‘proper-paired’ reads that have been normalized to the number of reads aligned back to the sacCer3 genome. Bedgraphs of these normalized bed files were generated as intermediary files to facilitate generation of bigwig coverage maps using the bedGraphToBigWig from UCSC (v4). For peak calling, the recently developed SEACR (v1.1 tested) was utilized and run in both “stringent” and “relaxed” mode to produce peak files with the flag “non” as the bed files were already normalized to the number of yeast spike-in reads. Subsequently peak file columns were re-arranged to facilitate motif discovery using both HOMER (v4.10 tested, flags “-size given,50,100,200 -mask -p 20 -S 50”) and MEME (v5.0.5 tested, “-dreme-m 50 -meme-nmotifs 50”). Peaks were annotated using the R package ChIPSeeker (v1.20.0 tested). Overlapping peak subsets were generated using mergePeaks.py from the HOMER suite with the flag “-d 1000”.
NicE-seq - Raw fastq reads were trimmed and mapped in the same way as described above for CUT&RUN data. Peaks were called with Model-based Analysis of ChIP-seq (MACS) (v2.1.1.20160309) for scrambled control sample and SALL4 KD sample separately. Unique peaks to each sample was identified with HOMER mergePeaks function. bamCoverage (described above for CUT&RUN) was used to generate BigWig files for visualization.
RNA-Seq - The raw fastq reads were trimmed using TrimGalore (v.0.4.5) and mapped to hg38 genome downloaded from UCSC using STAR. The read counts table was generated with the featureCounts. Differential gene expression analysis was performed to compare scrambled control samples with SALL4 KD samples using DESeq2 and the fold change was plotted as a volcano plot using ggplot2.
Genome_build: (human) hg38, (yeast) saccer3
Supplementary_files_format_and_content: peak files; gene x features expression matrix tab delimited text
 
Submission date Aug 26, 2019
Last update date Jan 13, 2021
Contact name Mahmoud Adel Bassal
E-mail(s) [email protected]
Organization name Beth Israel Deaconess Medical Center
Department Hematology and Oncology
Lab Tenen Lab
Street address 3 Blackfan Circle
City Boston
State/province Massachusetts
ZIP/Postal code 02131-4834
Country USA
 
Platform ID GPL18573
Series (1)
GSE136332 Zinc finger protein SALL4 functions through an AT-rich motif to regulate gene expression
Relations
BioSample SAMN12637061
SRA SRX6759439

Supplementary file Size Download File type/resource
GSM4046330_rnaseq_scramble_repl2_rawCounts.txt.gz 294.5 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap