Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which can make it robust in picking out 3'-end adapters, which often include poor quality bases.
scythe -a adapter_file.fasta -o trimmed_sequences.fasta sequences.fastq
By default, the prior contamination rate is 0.05. This can be changed (and one is encouraged to do so!) with:
scythe -a adapter_file.fasta -p 0.1 -o trimmed_sequences.fastq sequences.fastq
If you'd like to use standard out, it is recommended you use the --quiet
option:
scythe -a adapter_file.fasta --quiet sequences.fastq > trimmed_sequences.fastq
Also, more detailed output about matches can be obtained with:
scythe -a adapter_file.fasta -o trimmed_sequences.fastq -m matches.txt sequences.fastq
By default, Illumina's quality scheme (pipeline > 1.3) is used. Sanger or Solexa (pipeline < 1.3) qualities can be specified with -q
:
scythe -a adapter_file.fasta -q solexa -o trimmed_sequences.fastq sequences.fastq
Lastly, one can specify the minimum match length argument with -n
and the minimum length of sequence (discarded less than or equal to this parameter) to keep after trimming with -M
:
scythe -a adapter_file.fasta -n 0 -M 10 -o trimmed_sequences.fastq sequences.fastq
The default is 5. If this pre-processing is upstream of assembly on a very contaminated lane, decreasing this parameter could lead to very liberal trimming, i.e. of only a few bases.