Reference Code backup Executable files
Check if the current sequencing depth was saturated or not (or if the RPKM values were stable or not) in terms of genes’ expression estimation by resampling a series of subsets from total RNA reads and then calculate RPKM value using each subset
RPKM_saturation.py [options] -r REFGENE_BED -i INPUT_BAM -o OUTPUT_PREFIX
The precision of any sample statitics (RPKM) is affected by sample size (sequencing depth); “resampling” or “jackknifing” is a method to estimate the precision of sample statistics by using subsets of available data. This module will resample a series of subsets from total RNA reads and then calculate RPKM value using each subset. By doing this we are able to check if the current sequencing depth was saturated or not (or if the RPKM values were stable or not) in terms of genes’ expression estimation. If sequencing depth was saturated, the estimated RPKM value will be stationary or reproducible. By default, this module will calculate 20 RPKM values (using 5%, 10%, ... , 95%,100% of total reads) for each transcripts.
--strand='1++,1--,2+-,2-+'
means that this is a pair-end, strand-specific RNA-seq, and the strand rule is:
infer_experiment.py
(default=none, Not a strand specific RNA-seq data)All transcripts were sorted in ascending order according to expression level (RPKM). Then they are divided into 4 groups:
Estimate sequencing saturation on a forward-designed RNA-seq library (Pairend_StrandSpecific_51mer_Human_hg19.bam).
RPKM_saturation.py -r hg19.refseq.bed12 -d '1++,1--,2+-,2-+' -i Pairend_StrandSpecific_51mer_Human_hg19.bam -o output