Category

Reads Manipulation


Usage

kraken-translate --db $DBNAME sequences.kraken > sequences.labels


Manual

Multithreading: Use the --threads NUM switch to use multiple threads.

Quick operation: Rather than searching all k-mers in a sequence, stop classification after the first database hit; use --quick to enable this mode. Note that --min-hits will allow you to require multiple hits before declaring a sequence classified, which can be especially useful with custom databases when testing to see if sequences either do or do not belong to a particular genome.

Sequence filtering: Classified or unclassified sequences can be sent to a file for later processing, using the --classified-out and --unclassified-out switches, respectively.

Output redirection: Output can be directed using standard shell redirection (| or >), or using the --output switch.

FASTQ input: Input is normally expected to be in FASTA format, but you can classify FASTQ data using the --fastq-input switch.

Compressed input: Kraken can handle gzip and bzip2 compressed files as input by specifying the proper switch of --gzip-compressed or --bzip2-compressed.

Input format auto-detection: If regular files are specified on the command line as input, Kraken will attempt to determine the format of your input prior to classification. You can disable this by explicitly specifying --fasta-input, --fastq-input, --gzip-compressed, and/or  --bzip2-compressed as appropriate. Note that use of the character device file /dev/fd/0 to read from standard input (aka stdin) will not allow auto-detection.

Paired reads: Kraken does not query k-mers containing ambiguous nucleotides (non-ACGT). If you have paired reads, you can use this fact to your advantage and increase Kraken's accuracy by concatenating the pairs together with a single N between the sequences. Using the --paired option when running kraken will automatically do this for you; simply specify the two mate pair files on the command line. We have found this to raise sensitivity by about 3 percentage points over classifying the sequences as single-end reads.


Share your experience or ask a question