|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Dec 12, 2019 |
Title |
K562_4sUchr_ONT_5a |
Sample type |
SRA |
|
|
Source name |
K562 cells
|
Organism |
Homo sapiens |
Characteristics |
cell treatment: 4-thiouridine (4sU), 500 uM, 8 minutes fractionation: chromatin purification 4su selection: 4sU biotinylation and streptavidin pulldown library prep: Oxford Nanopore Technologies direct RNA sequencing with SQK-RNA002 kit
|
Treatment protocol |
Cells were labeled in media containing 500 uM 4-thiouridine for 8 minutes.
|
Growth protocol |
K562 cells (ATCC, CCL-243) were maintained at 37°C and 5% CO2 in RPMI 1640 medium containing 10% FBS, 100 U/ml penicillin and 100 ug/ml streptomycin.
|
Extracted molecule |
total RNA |
Extraction protocol |
K562 cells were harvested in suspension at 0.8-1 million cells/mL and proceeded imediately to cellular fractionation protocol. Cellular fractionation was performed as described in (Mayer, Nat. Protoc., 2016) and based on (Wuarin & Schibler, MCB, 1994). 10 million K562 cells were collected by centrifugation at 500 g for 2 minutes and washed with 1X PBS. The cell pellet was lysed for 2 min with 200 μl cytoplasmic lysis buffer (0.15% (vol/vol) NP-40, 10 mM Tris-HCl (pH 7.0), and 150 mM NaCl), layered over 500 μl of a sucrose cushion (10 mM Tris-HCl (pH 7.0), 150 mM NaCl, 25% (wt/vol) sucrose), and nuclei were collected by centrifugation at 16,000 g for 10 minutes. The nuclei pellet was resuspended in 800 μl wash buffer (0.1% (vol/vol) Triton X-100, 1 mM EDTA, in 1X PBS) and collected by centrifugation at 1,150 g for 1 minute. Washed nuclei were resuspended in 200 μl glycerol buffer (20 mM Tris-HCl (pH 8.0), 75 mM NaCl, 0.5 mM EDTA, 50% (vol/vol) glycerol, 0.85 mM DTT), and mixed with 200 μl nuclei lysis buffer (1% (vol/vol) NP-40, 20 mM HEPES (pH 7.5), 300 mM NaCl, 1 M urea, 0.2 mM EDTA, 1 mM DTT) before pulse vortex and incubation on ice for 2 minutes. The chromatin pellet was collected by centrifugation at 18,500 g for 2 minutes and resuspended in 1X PBS. All steps were performed at 4°C and all buffers were prepared with 25 μM α-amanitin, 0.05U/μl SUPERase.In, and protease inhibitor mix. Samples were immediately resuspended in Qiazol lysis reagent and RNA was extracted following manufacturer's instructions. Labeled RNA (1 μg / 10 μl) was incubated with 10% biotinylation buffer (100mM Tris pH 7.5, 10mM EDTA) and 20% EZ-Link Biotin-HPDP (1mg/mL resuspended in DMF, Thermo Fisher Scientific, 21341) for 1.5 hours at 800 rpm and 24°C in the dark. RNA was purified by mixing with a 1:1 ratio of chloroform/isoamylacohol (24:1), separating with a phase-lock tube at 16000 g for 5 min, and performing isopropanol precipitation. Biotinylated RNA separation was performed using the μMACS streptavidin kit (Miltenyi Biotec, 130-074-101). RNA was mixed with μMACS streptavidin beads at a 2:1 ratio at 800 rpm and 24°C for 15 min. RNA-streptavidin beads mix were transferred to the μMACS column and washed with wash buffer (100mM Tris pH 7.5, 10mM EDTA, 1M NaCl, 0.1% Tween 20) at 65°C and RT each 3 times. Selected RNA was eluted off the magnet with 0.1M DTT and purified using the miRNeasy micro kit (Qiagen, 217084) with on column DNase I treatment (Qiagen, 79254). Ribosomal RNAs were depleted from the 4sU-selected chromatin-associated RNA sample using RiboMinus Eukaryotic Kit v2 (ThermoFisher, A15020). A poly(I) tail was added to the 3’ ends of the RNA sample with yeast poly(A) polymerase (ThermoFisher, 74225Z25KU) and ITP, incubating for 30 min at 37°C. Direct RNA sequencing protocol using the SQK-RNA002 kit (Oxford Nanopore Technologies Ltd.) was followed exactly as described by the manufacturer with minor modifications described below. 500 ng of RNA sample was ligated to the provided splinted adapter with T4 DNA ligase for 15 minutes at room temperature. Ligated RNA sample was reverse transcribed using SuperScript III (Invitrogen, 18080044), as recommended by Oxford Nanopore Technologies to improve the reading of RNA through the nanopore. Samples were purified with with Agencourt RNAClean XP beads (Beckman Coulter, A63987) and ligated to the sequencing adapter with preloaded motor protein. After a second purification step with Agencourt RNAClean XP beads, the sample was resuspended in Elution Buffer, mixed with RNA Running Buffer, loaded onto a primed FLO-MIN106 flowcell, and sequenced using MINKNOW software on the PromethION instrument for 48 hours with default settings for direct RNA sequencing.
|
|
|
Library strategy |
RNA-Seq |
Library source |
transcriptomic |
Library selection |
cDNA |
Instrument model |
PromethION |
|
|
Description |
K562 4sU labeled chromatin-associated RNA with rRNA depletion and poly(I) tail addition biological rep 5a
|
Data processing |
For Illumina RNA-seq datasets, paired-end reads were aligned to reference genomes using STAR (v2.5.1a) with default parameters (except outFilterMultimapNmax=101, outSJfilterOverhangMin=3 1 1 1, outSJfilterDistToOtherSJmin=0 0 0 0, alignIntronMin=11, alignEndsType=EndToEnd). The splicing index counts file was generated for each gene by counting the number of read pairs that span splice junctions by at least 3 nucleotides. '5SS_count' column represents read pair counts over an intron's 5'SS; '3SS_count' column represents read pair counts over an intron's 3'SS; 'splice_count' represents read pair counts that span the exon-exon junction of a spliced intron. For Oxford Nanopore Technologies direct RNA sequencing datasets, raw signal fast5 files were basecalled using Albacore 2.2.7 (Oxford Nanopore Technologies Ltd.) with the following parameters: read_fast5_basecaller.py --flowcell FLO-MIN106 --kit SQK-RNA001 --recursive --output_format fast5,fastq --worker_threads 8 --save_path ${savePath} --input ${inputPath}. RNA sequences that pass basecalling thresholds were converted into DNA by substituting U to T bases before sequence alignment. Sequences were aligned to the reference genomes using minimap2 (version 2.10-r764-dirty) with recommended parameters for Oxford Nanopore Technologies direct RNA sequencing (-ax splice -uf -k14) and GMAP (version 2018-03-25) with default parameters. All analyses were performed using reads that pass the MINKNOW sequencing threshold (QC>7) and align uniquely to the genome. For Oxford Nanopore Technologies direct RNA sequencing, the read ends dataset was generated by intersectin the start position of the read (end position of the RNA) to annotated gene features. “Intron” and “Exon” regions refer to their annotated features within protein coding genes. “Poly(A)” sites are defined as regions within 50 nucleotides of the end coordinate of annotated genes. In K562 datasets, the poly(A) region also includes regions that are within 50 nucleotides of RNA-PET annotations from cytoplasm and chromatin fractions in K562 ENCODE data (40). “Post_poly(A)” sites are defined as the region 50-550 nucleotides after the end of annotated genes. “Splice_sites” are defined as 50 nucleotides upstream and 10 nucleotides downstream of annotated 5’ splice sites. “Undetermined” categorizes reads that do not fit into one category and “other” represents read ends that do not align in the sense direction to annotated gene features (e.g. antisense transcripts, noncoding RNAs, intergenic transcription, etc.). For Oxford Nanopore Technologies direct RNA sequencing, the splice dataset was generated by first intersecting reads with constitutively spliced introns. Constitutively spliced introns with “medium stringency” included in this dataset have at least 20 reads total RNA-seq dataset that span either splice junction by at least 4 nt overlap and more than 80% of the spanning reads are spliced. For reads mapping to constitutive introns, the features of read cigar strings were extracted for the 50 nt around the intron 5’ and 3’ splice sites and the entirety of the intron. Reads were called as ‘not spliced’ if the alignment file shows no indication of splicing within the 50 nt around each splice site, mapped portions of the read (rather than deletions) represent greater than 50% of the 50 nt around each splice site, and at least 75% of the read within the intron is aligned to the reference. Reads were called as ‘spliced’ if the alignment file displays the start or end of a splicing event within the 50nt around both splice sites and the size of the aligned splicing event is within 90-110% and 100 nt of the intron size. If aligned reads that map to introns do not meet these qualifications, the splicing event is characterized as ‘undetermined’ and not used in subsequent analyses. Other information included in this file include the soft clipped length at the start of the read alignment, the distance between the read start (or aligned RNA end) and the 3’SS of the intron, and whether or not the read start (or aligned RNA end) is close to the poly(A) site of a gene (within 150 nt upstream or any distance downstream of the annotated poly(A) site of the gene) For Oxford Nanopore Technologies direct RNA sequencing, the intron pairs dataset was generated by identifying reads that span two or more introns. The coordinates and splicing status (from the splice dataset) of the two introns is recorded in the file. Genome_build: Reference genomes for human K562 and Drosophila S2 sequence alignments were obtained from ENSEMBLE GRCh38 (release-86) and FlyBase dm6 (r6.19), respectively. Supplementary_files_format_and_content: Tab delimited ‘*SI_counts.txt’ files contain contain read pair counts spanning intron junctions. Tab delimited '*end_stats.txt' files contain the read name and feature that the start of the read (end of the RNA) maps to. Tab delimited ‘*medIntrons_discarded_splice_df.txt’ files contain the read name, coordinates of the intron, read length, alignment error rate, read start clipped length, distance from the read start (or RNA end) to the 3’SS of the intron, the determined splicing status, gene and poly(A) site separated by ‘_’, and a column for if the RNA ends near or after the gene’s poly(A) site. Tab delimited ‘*intron_pairs_df.txt’ files contain the read name, intron coordinates, and splicing status of two introns that one read maps to.
|
|
|
Submission date |
Sep 11, 2019 |
Last update date |
Jul 08, 2021 |
Contact name |
Karine Choquet |
E-mail(s) |
[email protected]
|
Organization name |
Université de Sherbrooke
|
Department |
Biochemistry and Functional Genomics
|
Street address |
3201 rue Jean-Mignault
|
City |
Sherbrooke |
State/province |
Québec |
ZIP/Postal code |
J1E 4K8 |
Country |
Canada |
|
|
Platform ID |
GPL26167 |
Series (1) |
GSE123191 |
Human co-transcriptional splicing kinetics and coordination revealed by direct nascent RNA sequencing |
|
Relations |
BioSample |
SAMN12726877 |
SRA |
SRX6829577 |
Supplementary file |
Size |
Download |
File type/resource |
GSM4073917_K562_4sUchr_ONT_5a_end_stats_df.txt.gz |
4.5 Mb |
(ftp)(http) |
TXT |
GSM4073917_K562_4sUchr_ONT_5a_intron_pairs_df.txt.gz |
152.7 Kb |
(ftp)(http) |
TXT |
GSM4073917_K562_4sUchr_ONT_5a_medIntrons_discarded_splice_df.txt.gz |
661.5 Kb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
Processed data provided as supplementary file |
|
|
|
|
|