Software Usage Function
Kraken-report kraken-report --db $DBNAME kraken.output This tool is designed to translate results of the Kraken metagenomic classifier (see citations below) to the full representation of NCBI taxonomy.
Kraken kraken --db $DBNAME seqs.fa Kraken is a taxonomic sequence classifier that assigns taxonomic labels to short DNA reads.
Kraken-filter kraken-filter --db $DBNAME [--threshold NUM] kraken.output At present, we have not yet developed a confidence score with a solid probabilistic interpretation for Kraken. However, we have developed a simple scoring scheme that has yielded good results for us, and we've made that available in the kraken-filter script. The approach we use allows a user to specify a threshold score in the [0,1] interval; the kraken-filter script then will adjust labels up the tree until the label's score (described below) meets or exceeds that threshold. If a label at the root of the taxonomic tree would not have a score exceeding the threshold, the sequence is called unclassified by kraken-filter.
Cutadapt cutadapt -a AACCGGTT -o output.fastq input.fastq Trim a 3’ adapter by using cutadapt
Scythe scythe -a adapter_file.fasta -o trimmed_sequences.fasta sequences.fastq Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which can make it robust in picking out 3'-end adapters, which often include poor quality bases.
clipping_profile.py clipping_profile.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -s "PE" -o out Calculate the distributions of clipped nucleotides across reads
PRINSEQ prinseq-lite.pl [-fasta|-fastq] input_reads_pair_1.[fasta|fastq] [-fasta2|-fastq2] input_reads_pair_2.[fasta|fastq] -out_format [1|2|3|4|5] [options] PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data. PRINSEQ is available through a user-friendly web interface or as standalone version. The standalone version is primarily designed for data preprocessing and does not generate summary statistics in graphical form.
cd-hit-dup cd-hit-dup -i R1.fa -i2 R2.fa -o output-R1.fa -o2 output-R2.fa [other options] cd-hit-dup is a simple tool for removing duplicates from sequencing reads, with optional step to detect and remove chimeric reads.
CAP3 cap3 input_reads.fasta [options] > output.txt CAP3 (Contig Assembly Program) is a DNA sequence assembly program for small-scale assembly with or without quality values.
geneBody_coverage.py geneBody_coverage.py -r hg19.housekeeping.bed -i bam_path.txt -o output Calculate the RNA-seq reads coverage over gene body.
geneBody_coverage.py geneBody_coverage.py -r hg19.housekeeping.bed -i test1.bam,test2.bam,test3.bam -o output Calculate the RNA-seq reads coverage over gene body.
cd-hit-dup cd-hit-dup -i input.fq -o output.fq [other options] cd-hit-dup is a simple tool for removing duplicates from sequencing reads, with optional step to detect and remove chimeric reads.
geneBody_coverage.py geneBody_coverage.py -r hg19.housekeeping.bed -i test.bam -o output Calculate the RNA-seq reads coverage over gene body.
mismatch_profile.py mismatch_profile.py -l 101 -i ../test.bam -o out Calculate the distribution of mismatches across reads.
cd-hit-dup cd-hit-dup -i R1.fq -i2 R2.fq -o output-R1.fq -o2 output-R2.fq [other options] cd-hit-dup is a simple tool for removing duplicates from sequencing reads, with optional step to detect and remove chimeric reads.
Kraken-translate kraken-translate --db $DBNAME sequences.kraken > sequences.labels The file sequences.labels generated by the above example is a text file with two tab-delimited columns, and one line for each classified sequence in sequences.fa; unclassified sequences are not reported by kraken-translate.
deletion_profile.py deletion_profile.py -i sample.bam -l 101 -o out Calculate the distributions of deletions across reads
fastp fastp [options] ... An ultra-fast all-in-one FASTQ preprocessor
fastqc fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN Generate QC reports for fastq files
UMI-Tools umi_tools dedup [OPTIONS] [--stdin=IN_BAM] [--stdout=OUT_BAM] > OUTFILE Deduplicate reads using UMI and mapping coordinates