Reads Manipulation

CAP3: Function: CAP3 (Contig Assembly Program) is a DNA sequence assembly program for small-scale assembly with or without quality values.

Usage: cap3 input_reads.fasta [options] > output.txt
PRINSEQ: Function: PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data. PRINSEQ is available through a user-friendly web interface or as standalone version. The standalone version is primarily designed for data preprocessing and does not generate summary statistics in graphical form.

Usage: prinseq-lite.pl [-fasta|-fastq] input_reads.[fasta|fastq] -out_format [1|2|3|4|5] [options]
correctGCBias: Function: This tool corrects the GC-bias using the method proposed by [Benjamini & Speed (2012). Nucleic Acids Research, 40(10)]. It will remove reads from regions with too high coverage compared to the expected values (typically GC-rich regions) and will add reads to regions where too few reads are seen (typically AT-rich regions). The tool computeGCBias needs to be run first to generate the frequency table needed here.

Usage: correctGCBias -b file.bam --effectiveGenomeSize 2150570000 -g mm9.2bit --GCbiasFrequenciesFile freq.txt -o gc_corrected.bam [options]
insertion_profile.py: Function: Calculate the distributions of inserted nucleotides across reads.

Usage: insertion_profile.py -s "PE" -i test.bam -o out
maq fasta2bfa: Function: Convert sequences in FASTA format to Maqâ€™s BFA (binary FASTA) format.

Usage: maq fasta2bfa in.ref.fasta out.ref.bfa
bamPEFragmentSize: Function: This tool samples the given BAM files with paired-end data to estimate the fragment length distribution. Properly paired reads are preferred for computation, i.e., unless a region does not contain any concordant pairs, discordant pairs are ignored.

Usage: bamPEFragmentSize [-h] [--bamfiles bam files [bam files ...]] [--histogram FILE] [--plotFileFormat FILETYPE] [--numberOfProcessors INT] [--samplesLabel SAMPLESLABEL [SAMPLESLABEL ...]] [--plotTitle PLOTTITLE] [--maxFragmentLength MAXFRAGMENTLENGTH] [--logScale] [--binSize INT] [--distanceBetweenBins INT] [--blackListFileName BED file] [--table FILE] [--outRawFragmentLengths FILE] [--verbose] [--version]
cd-hit-dup: Function: cd-hit-dup is a simple tool for removing duplicates from sequencing reads, with optional step to detect and remove chimeric reads. A number of options are provided to tune how the duplicates are removed.

Usage: cd-hit-dup -i input.fa -o output
maq sol2sanger: Function: Convert Solexa FASTQ to standard/Sanger FASTQ format.

Usage: maq sol2sanger in.sol.fastq out.sanger.fastq
Kraken-report: Function: This tool is designed to translate results of the Kraken metagenomic classifier (see citations below) to the full representation of NCBI taxonomy.

Usage: kraken-report --db $DBNAME kraken.output
read_GC.py: Function: GC content distribution of reads.

Usage: read_GC.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output
PRINSEQ: Function: PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data. PRINSEQ is available through a user-friendly web interface or as standalone version. The standalone version is primarily designed for data preprocessing and does not generate summary statistics in graphical form.

Usage: prinseq-lite.pl [-fasta|-fastq] input_reads_pair_1.[fasta|fastq] [-fasta2|-fastq2] input_reads_pair_2.[fasta|fastq] -out_format [1|2|3|4|5] [options]
geneBody_coverage.py: Function: Calculate the RNA-seq reads coverage over gene body.

Usage: geneBody_coverage.py -r hg19.housekeeping.bed -i /data/alignment/ -o output
Kraken-translate: Function: The file sequences.labels generated by the above example is a text file with two tab-delimited columns, and one line for each classified sequence in sequences.fa; unclassified sequences are not reported by kraken-translate.

Usage: kraken-translate --db $DBNAME sequences.kraken > sequences.labels
clipping_profile.py: Function: Calculate the distributions of clipped nucleotides across reads

Usage: clipping_profile.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -s "PE" -o out
Kraken: Function: Kraken is a taxonomic sequence classifier that assigns taxonomic labels to short DNA reads.

Usage: kraken --db $DBNAME seqs.fa