GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM3123804

Query DataSets for GSM3123804

Status

Public on Jul 31, 2018

Title

UnStim-siRNAs523629-24hr-B

Sample type

SRA

Source name

U2OS cell

Organism

Homo sapiens

Characteristics

stimulation: UnStim
treatment: siRNAs523629
time: 24hr
cell line: U2OS

Extracted molecule

polyA RNA

Extraction protocol

RNA libraries were prepared for sequencing using standard Illumina protocols

Library strategy

RNA-Seq

Library source

transcriptomic

Library selection

cDNA

Instrument model

Illumina HiSeq 2500

Description

replicate B

Data processing

Reads were aligned to reference genome (hg19) using STAR aligner
QC includes sequence quality, GC content, 5’-3’ gene body coverage (Supplementary Table S6)
Outlier detection absolute Z score > 2 was applied on overall sequencing quality score, 5 prime coverage, 3 prime coverage, mean_GC content, duplication rate, mean_ and mapped percentage. Sample with absolute Z score > 2 would be discarded, which did not apply to this study.
Aligned reads were counted against gene model annotation (Gencode v18) to obtain expression values by using FeatureCounts
DESeq2 was used for gene expression normalization
Genome_build: hg19
Supplementary_files_format_and_content: Tab-delimited text files include RPKM values for each Sample:
DESeq2_normalized_count_matrix.txt: Count matrix for counts per gene, normalized for library size using DESeq2
DESeq2_regularized_log_transformed.txt: Made using the rlogTransformation function in DESeq2. This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts, and which normalizes with respect to library size. These are the values used to obtain clustering and PCA results.
QC_statistics_to_deliver.txt: Tot_reads(M)- Total number of reads in the sample, in millions rRNA(%)- Percent of reads aligned to ribosomal RNA Map(%)- Percent of reads which map to genome UQ_map(%)- Percent of reads which map to only one locus on the genome Gene_assn(%)- Of uniquely mapped reads, percentage which may be assigned to exons of genes by gene counting program featureCounts Strand(%)- Percentage of reads which are reversely stranded. If unstranded protocol, this is not counted and listed as 0. 5prime_cov- Mean normalized coverage from Picard for percentiles 11-30 along gene body 3prime_cov- Mean normalized coverage from Picard for percentiles 71-90 along gene body Mean_GC(%)- Mean GC content averaging GC content for each read across all reads Dup(%)- Percentage of reads which were duplicates Mean_inner_dist- Inner distance = Read 2 start - Read 1 end. Negative indicates overlap. CDS, UTR, Intronic, and Intergenic(%)- Percentage of reads assigned to different regions of the genome. These four add up to 100%. chrM(%)- For reads assigned to genes, percentage assigned to mitochondrial genes. Top_ten(%)- For reads assigned to genes, percentage assigned to the top 10 most highly expressed genes.
featureCounts_count_matrix.txt: Raw count matrix with counts per gene as obtained by featureCounts
gene_annotation_info.txt: More info on the genes in the count matrices, such as gene coordinates, biotype (coding vs. noncoding), etc.
kallisto_TPM_matrix.txt: Per transcript in the annotation, the TPM (transcripts per million) value given by Kallisto. If this file is not present, then Kallisto was not run for this project.
kallisto_est_count_matrix.txt: Per transcript in the annotation, the estimated counts value given by Kallisto. If this file is not present, then Kallisto was not run for this project.

Submission date

May 01, 2018

Last update date

Jul 31, 2018

Contact name

Kejie Li

Organization name

Biogen

Street address

225 Binney Street

City

Cambridge