Kraken-report |
kraken-report --db $DBNAME kraken.output |
This tool is designed to translate results of the Kraken metagenomic classifier (see citations below) to the full representation of NCBI taxonomy. |
Kraken |
kraken --db $DBNAME seqs.fa |
Kraken is a taxonomic sequence classifier that assigns taxonomic labels to short DNA reads. |
Kraken-filter |
kraken-filter --db $DBNAME [--threshold NUM] kraken.output |
At present, we have not yet developed a confidence score with a solid probabilistic interpretation for Kraken. However, we have developed a simple scoring scheme that has yielded good results for us, and we've made that available in the kraken-filter script. The approach we use allows a user to specify a threshold score in the [0,1] interval; the kraken-filter script then will adjust labels up the tree until the label's score (described below) meets or exceeds that threshold. If a label at the root of the taxonomic tree would not have a score exceeding the threshold, the sequence is called unclassified by kraken-filter. |
Cutadapt |
cutadapt -a AACCGGTT -o output.fastq input.fastq |
Trim a 3’ adapter by using cutadapt |
Scythe |
scythe -a adapter_file.fasta -o trimmed_sequences.fasta sequences.fastq |
Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which can make it robust in picking out 3'-end adapters, which often include poor quality bases. |
clipping_profile.py |
clipping_profile.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -s "PE" -o out |
Calculate the distributions of clipped nucleotides across reads |
PRINSEQ |
prinseq-lite.pl [-fasta|-fastq] input_reads_pair_1.[fasta|fastq] [-fasta2|-fastq2] input_reads_pair_2.[fasta|fastq] -out_format [1|2|3|4|5] [options] |
PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence data. PRINSEQ is available through a user-friendly web interface or as standalone version. The standalone version is primarily designed for data preprocessing and does not generate summary statistics in graphical form. |
cd-hit-dup |
cd-hit-dup -i R1.fa -i2 R2.fa -o output-R1.fa -o2 output-R2.fa [other options] |
cd-hit-dup is a simple tool for removing duplicates from sequencing reads, with optional step to detect and remove chimeric reads. |
CAP3 |
cap3 input_reads.fasta [options] > output.txt |
CAP3 (Contig Assembly Program) is a DNA sequence assembly program for small-scale assembly with or without quality values. |
geneBody_coverage.py |
geneBody_coverage.py -r hg19.housekeeping.bed -i bam_path.txt -o output |
Calculate the RNA-seq reads coverage over gene body. |
geneBody_coverage.py |
geneBody_coverage.py -r hg19.housekeeping.bed -i test1.bam,test2.bam,test3.bam -o output |
Calculate the RNA-seq reads coverage over gene body. |
cd-hit-dup |
cd-hit-dup -i input.fq -o output.fq [other options] |
cd-hit-dup is a simple tool for removing duplicates from sequencing reads, with optional step to detect and remove chimeric reads. |
geneBody_coverage.py |
geneBody_coverage.py -r hg19.housekeeping.bed -i test.bam -o output |
Calculate the RNA-seq reads coverage over gene body. |
mismatch_profile.py |
mismatch_profile.py -l 101 -i ../test.bam -o out |
Calculate the distribution of mismatches across reads. |
cd-hit-dup |
cd-hit-dup -i R1.fq -i2 R2.fq -o output-R1.fq -o2 output-R2.fq [other options] |
cd-hit-dup is a simple tool for removing duplicates from sequencing reads, with optional step to detect and remove chimeric reads. |
Kraken-translate |
kraken-translate --db $DBNAME sequences.kraken > sequences.labels |
The file sequences.labels generated by the above example is a text file with two tab-delimited columns, and one line for each classified sequence in sequences.fa; unclassified sequences are not reported by kraken-translate. |
deletion_profile.py |
deletion_profile.py -i sample.bam -l 101 -o out |
Calculate the distributions of deletions across reads |
fastp |
fastp [options] ... |
An ultra-fast all-in-one FASTQ preprocessor |
fastqc |
fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN |
Generate QC reports for fastq files |
UMI-Tools |
umi_tools dedup [OPTIONS] [--stdin=IN_BAM] [--stdout=OUT_BAM] > OUTFILE |
Deduplicate reads using UMI and mapping coordinates |