|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Jan 13, 2021 |
Title |
rna_seq_scramble_repl2 |
Sample type |
SRA |
|
|
Source name |
SNU398_rna_seq_scramble
|
Organism |
Homo sapiens |
Characteristics |
cell line: SNU398 cell type: Hepatocellular Carcinoma cell line genotype/variation: scrambled shRNA
|
Treatment protocol |
SNU398 cells were transduced with scrambled shRNA and RNA was extracted by Trizol 72hours after transduction
|
Growth protocol |
SNU398 cells were grown in RPMI media with 10% FBS
|
Extracted molecule |
total RNA |
Extraction protocol |
Pair-end (75bp) Illumina sequencing was performed on the barcoded and amplified libraries
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
Illumina NextSeq 500 |
|
|
Description |
processed data file: rnaseq_featureCounts_exp_matrix.txt, 3scr_3shSALL4_gene_exp.txt
|
Data processing |
Cut&Run - Cut and Run Analysis Pipeline (CnRAP) scripts can be found on github (https://github.com/mbassalbioinformatics/CnRAP). Raw fastq files were trimmed with Trimmomatic (REF, v0.36 tested) in paired end mode with the flags “ILLUMINACLIP: <adapter_path> Truseq3.PE.fa:2:15:4:4:true LEADING:20 TRAILING:20 SLIDINGWINDOW:4:15 MINLEN:25”. Next, the kseq trimmer developed by the Orkin lab was run on each fastq file, there were no flags to modify. For alignment, BWA (v0.7.17-r1188 tested) was first run in “aln” mode on a masked hg38 genome downloaded from UCSC to create *.sai files. Next, BWA was run in “sampe” mode with the flag “-n 20” on the *.sai files. Afterwards, stampy (v1.0.32 tested) was run in “--sensitive" mode. For mapping statistics, bamtools (v2.5.1 tested) “stat” was used. Post alignment, unmapped reads were removed from bam files using bamtools with the flags “filter -isMapped true.” Next, using samtools (v1.5 tested), bam files were sorted (“sort -l 9 -O bam”), had read pair mates fixed (“fixmate”) and indexed (“index”). Bam coverage maps were generated using bamCoverage from the deeptools suite (v2.5.7 tested). The same procedure was run to align fastq files to a masked genome Saccharomyces Cerevisiae v3 (sacCer3), also downloaded from UCSC. In preparation for peak calling, a normalization factor was determined for each hg38 aligned replicate based on the corresponding number of proper-pairs aligned to the sacCer3 genome, as recommended in the Henikoff pipeline. This was calculated as follows normalization_factor=10,000,000\(#proper_pairs/2). The number of ‘proper-pairs’ was extracted from the bamtools mapping statistics calculated previously. Next, from the hg38 aligned bams, ‘proper-paired’ reads were extracted using samtools with the flags “view -b -f 2 -F 524” with the output piped into bedtools with the flags “genomecov -bg -scale <normalization_factor> -ibam stdin.” This produced bed files of ‘proper-paired’ reads that have been normalized to the number of reads aligned back to the sacCer3 genome. Bedgraphs of these normalized bed files were generated as intermediary files to facilitate generation of bigwig coverage maps using the bedGraphToBigWig from UCSC (v4). For peak calling, the recently developed SEACR (v1.1 tested) was utilized and run in both “stringent” and “relaxed” mode to produce peak files with the flag “non” as the bed files were already normalized to the number of yeast spike-in reads. Subsequently peak file columns were re-arranged to facilitate motif discovery using both HOMER (v4.10 tested, flags “-size given,50,100,200 -mask -p 20 -S 50”) and MEME (v5.0.5 tested, “-dreme-m 50 -meme-nmotifs 50”). Peaks were annotated using the R package ChIPSeeker (v1.20.0 tested). Overlapping peak subsets were generated using mergePeaks.py from the HOMER suite with the flag “-d 1000”. NicE-seq - Raw fastq reads were trimmed and mapped in the same way as described above for CUT&RUN data. Peaks were called with Model-based Analysis of ChIP-seq (MACS) (v2.1.1.20160309) for scrambled control sample and SALL4 KD sample separately. Unique peaks to each sample was identified with HOMER mergePeaks function. bamCoverage (described above for CUT&RUN) was used to generate BigWig files for visualization. RNA-Seq - The raw fastq reads were trimmed using TrimGalore (v.0.4.5) and mapped to hg38 genome downloaded from UCSC using STAR. The read counts table was generated with the featureCounts. Differential gene expression analysis was performed to compare scrambled control samples with SALL4 KD samples using DESeq2 and the fold change was plotted as a volcano plot using ggplot2. Genome_build: (human) hg38, (yeast) saccer3 Supplementary_files_format_and_content: peak files; gene x features expression matrix tab delimited text
|
|
|
Submission date |
Aug 26, 2019 |
Last update date |
Jan 13, 2021 |
Contact name |
Mahmoud Adel Bassal |
E-mail(s) |
[email protected]
|
Organization name |
Beth Israel Deaconess Medical Center
|
Department |
Hematology and Oncology
|
Lab |
Tenen Lab
|
Street address |
3 Blackfan Circle
|
City |
Boston |
State/province |
Massachusetts |
ZIP/Postal code |
02131-4834 |
Country |
USA |
|
|
Platform ID |
GPL18573 |
Series (1) |
GSE136332 |
Zinc finger protein SALL4 functions through an AT-rich motif to regulate gene expression |
|
Relations |
BioSample |
SAMN12637061 |
SRA |
SRX6759439 |
Supplementary file |
Size |
Download |
File type/resource |
GSM4046330_rnaseq_scramble_repl2_rawCounts.txt.gz |
294.5 Kb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
Processed data are available on Series record |
|
|
|
|
|