REDItoolDnaRna.py is the main script devoted to the identification of RNA editing events taking into account the combined information from RNA-Seq and DNA-Seq data in BAM format. To look at potential RNA editing candidates, RNA-Seq data alone can be used.
REDItoolDnaRna.py -i rnaseq.bam -j dnaseq.bam -f myreference.fa -o myoutputfolder
Options:
-i RNA-Seq BAM file
-j DNA-Seq BAM files separated by comma or folder containing BAM files. Note that each chromosome/region must be present in a single BAM file only.
-I Sort input RNA-Seq BAM file
-J Sort input DNA-Seq BAM file
-f Reference file in fasta format. Note that chromosome/region names in the reference must match chromosome/region names in BAMs files.
-C Base interval to explore [100000]. It indicates how many bases have to be loaded during the run.
-k List of chromosomes to skip separated by comma or file (each line must contain a chromosome/region name).
-t Number of threads [1]. It indicates how many processes should be launched. Each process will work on an individual chromosome/region.
-o Output folder [rediFolder_XXXX] in which all results will be stored. XXXX is a random number generated at each run.
-F Internal folder name [null] is the main folder containing output tables.
-M Save a list of columns with quality scores. It produces at most two files in the pileup-like format.
-c Minimum read coverage (dna,rna) [10,10]
-Q Fastq offset value (dna,rna) [33,33]. For Illumina fastq 1.3+ 64 should be used.
-q Minimum quality score (dna,rna) [25,25]
-m Minimum mapping quality score (dna,rna) [25,25]
-O Minimum homoplymeric length (dna,rna) [5,5]
-s Infer strand (for strand oriented reads) [1]. It indicates which read is in line with RNA. Available values are: 1:read1 as RNA,read2 not as RNA; 2:read1 not as RNA,read2 as RNA; 12:read1 as RNA,read2 as RNA; 0:read1 not as RNA,read2 not as RNA.
-g Strand inference type 1:maxValue 2:useConfidence [1]; maxValue: the most prominent strand count will be used; useConfidence: strand is assigned if over a prefixed frequency confidence (-x option)
-x Strand confidence [0.70]
-S Strand correction. Once the strand has been inferred, only bases according to this strand will be selected.
-G Infer strand by GFF annotation (must be GFF and sorted, otherwise use -X). Sorting requires grep and sort unix executables.
-K GFF File with positions to exclude (must be GFF and sorted, otherwise use -X). Sorting requires grep and sort unix executables.
-T Work only on given GFF positions (must be GFF and sorted, otherwise use -X). Sorting requires grep and sort unix executables.
-X Sort annotation files. It requires grep and sort unix executables.
-e Exclude multi hits in RNA-Seq
-E Exclude multi hits in DNA-Seq
-d Exclude duplicates in RNA-Seq
-D Exclude duplicates in DNA-Seq
-p Use paired concardant reads only in RNA-Seq
-P Use paired concardant reads only in DNA-Seq
-u Consider mapping quality in RNA-Seq
-U Consider mapping quality in DNA-Seq
-a Trim x bases up and y bases down per read [0-0] in RNA-Seq
-A Trim x bases up and y bases down per read [0-0] in DNA-Seq
-b Blat folder for correction in RNA-Seq
-B Blat folder for correction in DNA-Seq
-l Remove substitutions in homopolymeric regions in RNA-Seq
-L Remove substitutions in homopolymeric regions in DNA-Seq
-v Minimum number of reads supporting the variation [3] for RNA-Seq
-n Minimum editing frequency [0.1] for RNA-Seq
-N Minimum variation frequency [0.1] for DNA-Seq
-z Exclude positions with multiple changes in RNA-Seq
-Z Exclude positions with multiple changes in DNA-Seq
-W Select RNA-Seq positions with defined changes (separated by comma ex: AG,TC) [default all]
-R Exclude invariant RNA-Seq positions
-V Exclude sites not supported by DNA-Seq
-w File containing splice sites annotations (SpliceSite file format see above for details)
-r Num. of bases near splice sites to explore [4]
--gzip Gzip output files
-h, --help Print the help