Category

Reads Manipulation


Usage

cutadapt -a AACCGGTT -o output.fastq input.fastq


Manual

The sequence of the adapter is given with the -a option. You need to replace AACCGGTT with your actual adapter sequence. Reads are read from the input file input.fastq and written to the output file output.fastq.

Cutadapt searches for the adapter in all reads and removes it when it finds it. All reads that were present in the input file will also be present in the output file, some of them trimmed, some of them not. Even reads that were trimmed entirely (because the adapter was found in the very beginning) are output. All of this can be changed with command-line options, explained further down.

Input and output file formats

Input files for cutadapt need to be in one the these formats:

  • FASTA (file name extensions: .fasta.fa.fna)
  • FASTQ (extensions: .fastq.fq)
  • Any of the above, but compressed as .gz (even .bz2 and .xz are supported)

Input and output file formats are recognized from the file name extension. You can override the input format with the --format option.

You can even use this – without any adapter trimming – to convert from FASTQ to FASTA:

cutadapt -o output.fasta input.fastq.gz

Compressed files

Cutadapt supports compressed input and output files. Whether an input file needs to be decompressed or an output file needs to be compressed is detected automatically by inspecting the file name: If it ends in .gz, then gzip compression is assumed. You can therefore run cutadapt like this and it works as expected:

cutadapt -a AACCGGTT -o output.fastq.gz input.fastq.gz

All of cutadapt’s options that expect a file name support this.

Files compressed with bzip2 (.bz2) or xz (.xz) are also supported, but only if the Python installation includes the proper modules. xz files require Python 3.3 or later.

Concatenated bz2 files are not supported on Python versions before 3.3. These files are created by utilities such as pbzip2 (parallel bzip2).

Concatenated gz files are supported on all supported Python versions.

Standard input and output

If no output file is specified via the -o option, then the output is sent to the standard output stream. Instead of the example command line from above, you can therefore also write:

cutadapt -a AACCGGTT input.fastq > output.fastq

There is one difference in behavior if you use cutadapt without -o: The report is sent to the standard error stream instead of standard output. You can redirect it to a file like this:

cutadapt -a AACCGGTT input.fastq > output.fastq 2> report.txt

Wherever cutadapt expects a file name, you can also write a dash (-) in order to specify that standard input or output should be used. For example:

tail -n 4 input.fastq | cutadapt -a AACCGGTT - > output.fastq

The tail -n 4 prints out only the last four lines of input.fastq, which are then piped into cutadapt. Thus, cutadapt will work only on the last read in the input file.

In most cases, you should probably use - at most once for an input file and at most once for an output file, in order not to get mixed output.

You cannot combine - and gzip compression since cutadapt needs to know the file name of the output or input file. if you want to have a gzip-compressed output file, use -o with an explicit name.

One last “trick” is to use /dev/null as an output file name. This special file discards everything you send into it. If you only want to see the statistics output, for example, and do not care about the trimmed reads at all, you could use something like this:

cutadapt -a AACCGGTT -o /dev/null input.fastq

Read processing

Cutadapt can do a lot more in addition to removing adapters. There are various command-line options that make it possible to modify and filter reads and to redirect them to various output files. Each read is processed in the following way:

  1. Read modification options are applied. This includes adapter removalquality trimming, read name modifications etc. The order in which they are applied is the order in which they are listed in the help shown by cutadapt --help under the “Additional read modifications” heading. Adapter trimming itself does not appear in that list and is done after quality trimming and before length trimming (--length/-l).
  2. Filtering options are applied, such as removal of too short or untrimmed reads. Some of the filters also allow to redirect a read to a separate output file. The filters are applied in the order in which they are listed in the help shown by cutadapt --help under the “Filtering of processed reads” heading.
  3. If the read has passed all the filters, it is written to the output file.

Removing adapters

Cutadapt supports trimming of multiple types of adapters:

Adapter type Command-line option
3’ adapter -a ADAPTER
5’ adapter -g ADAPTER
Anchored 3’ adapter -a ADAPTER$
Anchored 5’ adapter -g ^ADAPTER
5’ or 3’ (both possible) -b ADAPTER
Linked adapter -a ADAPTER1...ADAPTER2


Share your experience or ask a question