Category

Mapping


Usage

tophat [options]* <genome_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]


Manual

Arguments:
The basename of the genome index to be searched. The basename is the name of any of the index files up to but not including the first period. Bowtie first looks in the current directory for the index files, then looks in the indexes subdirectory under the directory where the currently-running bowtie executable is located, then looks in the directory specified in the BOWTIE_INDEXES  (or BOWTIE2_INDEXES) environment variable.
Please note that it is highly recommended that a FASTA file with the sequence(s) the genome being indexed be present in the same directory with the Bowtie index files and having the name .fa. If not present, TopHat will automatically rebuild this FASTA file from the Bowtie index files.
A comma-separated list of files containing reads in FASTQ or FASTA format. When running TopHat with paired-end reads, this should be the *_1 ("left") set of files.
<[reads1_2,...readsN_2]> A comma-separated list of files containing reads in FASTQ or FASTA format. Only used when running TopHat with paired end reads, and contains the *_2 ("right") set of files. The *_2 files MUST appear in the same order as the *_1 files.
Options:
-h/--help Prints the help message and exits
-v/--version Prints the TopHat version number and exits
-N/--read-mismatches Final read alignments having more than these many mismatches are discarded. The default is 2.
--read-gap-length Final read alignments having more than these many total length of gaps are discarded. The default is 2.
--read-edit-dist Final read alignments having more than these many edit distance are discarded. The default is 2.
--read-realign-edit-dist Some of the reads spanning multiple exons may be mapped incorrectly as a contiguous alignment to the genome even though the correct alignment should be a spliced one - this can happen in the presence of processed pseudogenes that are rarely (if at all) transcribed or expressed. This option can direct TopHat to re-align reads for which the edit distance of an alignment obtained in a previous mapping step is above or equal to this option value. If you set this option to 0, TopHat will map every read in all the mapping steps (transcriptome if you provided gene annotations, genome, and finally splice variants detected by TopHat), reporting the best possible alignment found in any of these mapping steps. This may greatly increase the mapping accuracy at the expense of an increase in running time. The default value for this option is set such that TopHat will not try to realign reads already mapped in earlier steps.
--bowtie1 Uses Bowtie1 instead of Bowtie2. If you use colorspace reads, you need to use this option as Bowtie2 does not support colorspace reads.
-o/--output-dir Sets the name of the directory in which TopHat will write all of its output. The default is "./tophat_out".
-r/--mate-inner-dist This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. The default is 50bp.
--mate-std-dev The standard deviation for the distribution on inner distances between mate pairs. The default is 20bp.
-a/--min-anchor-length The "anchor length". TopHat will report junctions spanned by reads with at least this many bases on each side of the junction. Note that individual spliced alignments may span a junction with fewer than this many bases on one side. However, every junction involved in spliced alignments is supported by at least one read with this many bases on each side. This must be at least 3 and the default is 8.
-m/--splice-mismatches The maximum number of mismatches that may appear in the "anchor" region of a spliced alignment. The default is 0.
-i/--min-intron-length The minimum intron length. TopHat will ignore donor/acceptor pairs closer than this many bases apart. The default is 70.
-I/--max-intron-length The maximum intron length. When searching for junctions ab initio, TopHat will ignore donor/acceptor pairs farther than this many bases apart, except when such a pair is supported by a split segment alignment of a long read. The default is 500000.
--max-insertion-length The maximum insertion length. The default is 3.
--max-deletion-length The maximum deletion length. The default is 3.
--solexa-quals Use the Solexa scale for quality values in FASTQ files.
--solexa1.3-quals As of the Illumina GA pipeline version 1.3, quality scores are encoded in Phred-scaled base-64. Use this option for FASTQ files from pipeline 1.3 or later.
-Q/--quals Separate quality value files - colorspace read files (CSFASTA) come with separate qual files.
--integer-quals Quality values are space-delimited integer values, this becomes default when you specify -C/--color.
-C/--color Colorspace reads, note that it uses a colorspace bowtie index and requires Bowtie 0.12.6 or higher.
Common usage: tophat --color --quals [other options]* [reads1_2,...readsN_2] [quals1_2,...qualsN_2]
-p/--num-threads Use this many threads to align reads. The default is 1.
-g/--max-multihits Instructs TopHat to allow up to this many alignments to the reference for a given read, and choose the alignments based on their alignment scores if there are more than this number. The default is 20 for read mapping. Unless you use --report-secondary-alignments, TopHat will report the alignments with the best alignment score. If there are more alignments with the same score than this number, TopHat will randomly report only this many alignments. In case of using --report-secondary-alignments, TopHat will try to report alignments up to this option value, and TopHat may randomly output some of the alignments with the same score to meet this number.
--report-secondary-alignments By default TopHat reports best or primary alignments based on alignment scores (AS). Use this option if you want to output additional or secondary alignments  (up to 20 alignments will be reported this way, this limit can be changed by using the -g/--max-multihits option above).
--no-discordant For paired reads, report only concordant mappings.
--no-mixed For paired reads, only report read alignments if both reads in a pair can be mapped (by default, if TopHat cannot find a concordant or discordant alignment for both reads in a pair, it will find and report alignments for each read separately; this option disables that behavior).
--no-coverage-search Disables the coverage based search for junctions.
--coverage-search Enables the coverage based search for junctions. Use when coverage search is disabled by default (such as for reads 75bp or longer), for maximum sensitivity.
--microexon-search With this option, the pipeline will attempt to find alignments incident to micro-exons. Works only for reads 50bp or longer.
--library-type The default is unstranded (fr-unstranded). If either fr-firststrand or fr-secondstrand is specified, every read alignment will have an XS attribute tag as explained below. Consider supplying library type options below to select the correct RNA-seq protocol.
Library TypeExamplesDescription
fr-unstrandedStandard IlluminaReads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
fr-firststranddUTP, NSR, NNSRSame as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
fr-secondstrandLigation, Standard SOLiDSame as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
Advanced Options:
--bowtie-n TopHat uses "-v" in Bowtie for initial read mapping (the default), but with this option, "-n" is used instead. Read segments are always mapped using "-v" option.
--segment-mismatches Read segments are mapped independently, allowing up to this many mismatches in each segment alignment. The default is 2.
--segment-length Each read is cut up into segments, each at least this long. These segments are mapped independently. The default is 25.
--min-segment-intron The minimum intron length that may be found during split-segment search. The default is 50.
--max-segment-intron The maximum intron length that may be found during split-segment search. The default is 500000.
--min-coverage-intron The minimum intron length that may be found during coverage search. The default is 50.
--max-coverage-intron The maximum intron length that may be found during coverage search. The default is 20000.
--keep-tmp Causes TopHat to preserve its intermediate files produced during the run (mostly useful for debugging). The default is to delete these temporary files.
--keep-fasta-order In order to sort alignments in the same order in the genome fasta file, the option can be used. But this option will make the output SAM/BAM file incompatible with those from the previous versions of TopHat (1.4.1 or lower).
--no-sort-bam Output BAM is not coordinate-sorted.
--no-convert-bam Do not convert to bam format. Output is /accepted_hit.sam. Implies --no-sort-bam.
-R/--resume In case a TopHat run was terminated prematurely (process failure due to external factors, e.g. running out of memory because of other processes running on the same machine, or the disk getting full), users can attempt to resume the interrupted TopHat run by just providing this option with the output directory for that run. TopHat sets several checkpoints after every lengthy operations in the pipeline and when this option is provided, it will attempt to resume the pipeline from the last successful checkpoint. This special usage of TopHat only requires this option, e.g. the command line could simply be:
    tophat -R tophat_out (or your TopHat output directory if you used the -o/--output-dir option)
Note that none of the original options used for the original TopHat run should be provided, TopHat will find all the original options (and the checkpoint info) in the logs/run.log file found in the specified directory.
-z/--zpacker Manually specify the program used for compression of temporary files; default is gzip; use -z0 to disable compression altogether. Any program that is option-compatible with gzip can be used (e.g. bzip2, pigz, pbzip2).


Share your experience or ask a question