Category
Mapping
Usage
bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r> | --interleaved <i> | -b <bam>} [-S <sam>]
Manual
Required arguments
- -x <bt2-idx>: The basename of the index for the reference genome. The basename is the name of any of the index files up to but not including the final .1.bt2 / .rev.1.bt2 / etc. bowtie2 looks for the specified index first in the current directory, then in the directory specified in the BOWTIE2_INDEXES environment variable.
- -1 <m1>: Comma-separated list of files containing mate 1s (filename usually includes _1), e.g.
-1 flyA_1.fq,flyB_1.fq
. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m2>. Reads may be a mix of different lengths. If - is specified, bowtie2 will read the mate 1s from the "standard in" or "stdin" filehandle.
- -2 <m2>: Comma-separated list of files containing mate 2s (filename usually includes _2), e.g.
-2 flyA_2.fq,flyB_2.fq
. Sequences specified with this option must correspond file-for-file and read-for-read with those specified in <m1>. Reads may be a mix of different lengths. If - is specified, bowtie2 will read the mate 2s from the "standard in" or "stdin" filehandle.
- -U <r>: Comma-separated list of files containing unpaired reads to be aligned, e.g.
-U lane1.fq,lane2.fq,lane3.fq,lane4.fq
. Reads may be a mix of different lengths. If - is specified, bowtie2 gets the reads from the "standard in" or "stdin" filehandle.
- --interleaved: Reads interleaved FASTQ files where the first two records (8 lines) represent a mate pair.
- --sra-acc: Reads are SRA accessions. If the accession provided cannot be found in local storage it will be fetched from the NCBI database. If you find that SRA alignments are long running please rerun your command with the -p/--threads parameter set to desired number of threads. NB: this option is only available if bowtie 2 is compiled with the necessary SRA libraries. See Obtaining Bowtie 2 for details.
- -b <bam>: Reads are unaligned BAM records sorted by read name. The --align-paired-reads and --preserve-tags options affect the way Bowtie 2 processes records.
- -S <sam>: File to write SAM alignments to. By default, alignments are written to the "standard out" or "stdout" filehandle (i.e. the console).
Options
Input options
- Options that can be specified with -1, -2, or -U:
- -q: Reads are FASTQ files. FASTQ files usually have extension .fq or .fastq. FASTQ is the default format. See also: --solexa-quals and --int-quals.
- --qseq: Reads are QSEQ files. QSEQ files usually end in _qseq.txt. See also: --solexa-quals and --int-quals.
- -f: Reads are FASTA files. FASTA files usually have extension .fa, .fasta, .mfa, .fna or similar. FASTA files do not have a way of specifying quality values, so when -f is set, the result is as if --ignore-quals is also set.
- -r: Reads are files with one input sequence per line, without any other information (no read names, no qualities). When -r is set, the result is as if --ignore-quals is also set.
- --tab5: Each read or pair is on a single line. An input file can be a mix of unpaired and paired-end reads and Bowtie 2 recognizes each according to the number of fields, handling each as it should.
- An unpaired read line is [name]\t[seq]\t[qual]\n.
- A paired-end read line is [name]\t[seq1]\t[qual1]\t[seq2]\t[qual2]\n.
- --tab6: Similar to --tab5 except, for paired-end reads, the second end can have a different name from the first: [name1]\t[seq1]\t[qual1]\t[name2]\t[seq2]\t[qual2]\n
- -F k:<int>,i:<int>: Reads are subsliings (k-mers) exliacted from a FASTA file -U. Specifically, for every reference sequence in FASTA file -U, Bowtie 2 aligns the k-mers at offsets 1, 1+i, 1+2i, ... until reaching the end of the reference. Each k-mer is aligned as a separate read. Quality values are set to all Is (40 on Phred scale). Each k-mer (read) is given a name like <sequence>_<offset>, where <sequence> is the name of the FASTA sequence it was drawn from and <offset> is its 0-based offset of origin with respect to the sequence. Only single k-mers, i.e. unpaired reads, can be aligned in this way.
- -c: The read sequences are given on command line. I.e. -1, -2 and <singles> are comma-separated lists of reads rather than lists of read files. There is no way to specify read names or qualities, so -c also implies --ignore-quals.
- -s/--skip <int>: Skip (i.e. do not align) the first <int> reads or pairs in the input.
- -u/--qupto <int>: Align the first <int> reads or read pairs from the input (after the -s/--skip reads or pairs have been skipped), then stop. Default: no limit.
- -5/--liim5 <int>: liim <int> bases from 5' (left) end of each read before alignment (default: 0).
- -3/--liim3 <int>: liim <int> bases from 3' (right) end of each read before alignment (default: 0).
- --liim-to [3:|5:]<int>: liim reads exceeding <int> bases. Bases will be liimmed from either the 3' (right) or 5' (left) end of the read. If the read end if not specified, bowtie 2 will default to liimming from the 3' (right) end of the read. --liim-to and -3/-5 are mutually exclusive.
- --phred33: Input qualities are ASCII chars equal to the Phred quality plus 33. This is also called the "Phred+33" encoding, which is used by the very latest Illumina pipelines.
- --phred64: Input qualities are ASCII chars equal to the Phred quality plus 64. This is also called the "Phred+64" encoding.
- --solexa-quals: Convert input qualities from Solexa (which can be negative) to Phred (which can't). This scheme was used in older Illumina GA Pipeline versions (prior to 1.3). Default: off.
- --int-quals: Quality values are represented in the read input file as space-separated ASCII integers, e.g., 40 40 30 40..., rather than ASCII characters, e.g., II?I.... Integers are lieated as being on the Phred quality scale unless --solexa-quals is also specified. Default: off.
Preset options in end-to-end mode
- --very-fast: Same as:
-D 5
-R 1
-N 0
-L 22
-i S,0,2.50
- --fast: Same as:
-D 10
-R 2
-N 0
-L 22
-i S,0,2.50
- --sensitive: Same as:
-D 15
-R 2
-N 0
-L 22
-i S,1,1.15
(default in --end-to-end mode)
- --very-sensitive: Same as:
-D 20
-R 3
-N 0
-L 20
-i S,1,0.50
Preset options in local mode
- --very-fast-local: Same as:
-D 5
-R 1
-N 0
-L 25
-i S,1,2.00
- --fast-local: Same as:
-D 10
-R 2
-N 0
-L 22
-i S,1,1.75
- --sensitive-local: Same as:
-D 15
-R 2
-N 0
-L 20
-i S,1,0.75
(default in --local mode)
- --very-sensitive-local: Same as:
-D 20
-R 3
-N 0
-L 20
-i S,1,0.50
Alignment options
Reporting options
- -k <int>: By default,
bowtie2
searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i.
When -k is specified, however, bowtie2 behaves differently. Instead, it searches for at most <int> distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds <int>, whichever happens first. All alignments found are reported in descending order by alignment score. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. For reads that have more than <int> distinct, valid alignments, bowtie2 does not guarantee that the <int> alignments reported are the best possible in terms of alignment score. -k is mutually exclusive with -a.
- -a: Like -k but with no upper limit on number of alignments to search for. -a is mutually exclusive with -k.
Note: Bowtie 2 is not designed with large values for -k or the -a mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.
Scoring options
- --ma <int>: Sets the match bonus. In --local mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode. Default: 2.
- --mp MX,MN: Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If --ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor( (MX-MN)(MIN(Q, 40.0)/40.0) ) where Q is the Phred quality value. Default: MX = 6, MN = 2.
- --np <int>: Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N. Default: 1.
- --rdg <int1>,<int2>: Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5, 3.
- --rfg <int1>,<int2>: Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5, 3.
- --score-min <func>: Sets a function governing the minimum alignment score needed for an alignment to be considered "valid" (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function f to f(x) = 0 + -0.6 * x, where x is the read length. See also: setting function options. The default in --end-to-end mode is L,-0.6,-0.6 and the default in --local mode is G,20,8.
Share your experience or ask a question