Category

Sequence Analysis


Usage

tomtom [options] <query file> <target file>+


Manual

Tomtom searches one or more query motifs against one or more databases of target motifs (and their reverse complements when applicable), and reports for each query a list of target motifs, ranked by p-value. The E-value and the q-value of each match is also reported. The q-value is the minimal false discovery rate at which the observed similarity would be deemed significant. The output contains results for each query, in the order that the queries appear in the input file.

For a given pair of motifs, the program considers all offsets between the motifs, while requiring a minimum number of overlapping positions. For a given offset, each overlapping position is scored using one of seven column similarity functions defined below. Columns in the query motif that don't overlap the target motif are assigned a score equal to the median score of the set of random matches to that column.

In order to compute the scores, Tomtom needs to know the frequencies of the letters of the sequence alphabet in the database being searched (the 'background' letter frequencies). By default, the background letter frequencies included in the query motif file are used. The scores of columns that overlap for a given offset are summed. This summed score is then converted to a p-value. The reported p-value is the minimal p-value over all possible offsets. To compensate for multiple testing, each reported p-value is converted to an E-value by multiplying it by twice the number of target motifs. As a second type of multiple-testing correction, q-values for each match are computed from the set of p-values and reported.

Required arguments

  • query file: Query file in the MEME format
  • target file(s): Query databases in MEME format. Precompiled MEME databases are available here.

Options

  • -o <output dir>: Name of directory for output files; will not replace existing directory
  • -oc <output dir>: Name of directory for output files; will replace existing directory
  • -xalph: Convert the alphabet of the target motif databases to the alphabet of the query motif database assuming the core symbols of the target motif alphabet are a subset; default: reject differences
  • -bfile <background file>: Name of background file; default: use the background from the query motif database
  • -motif-pseudo <pseudo count>: Apply the pseudocount to the query and target motifs; default: apply a pseudocount of 0.1
  • -m <id>: Use only query motifs with a specified id; may be repeated
  • -mi <index>: Use only query motifs with a specifed index; may be repeated
  • -thresh <float>: Significance threshold; default: 0.5
  • -evalue: Use E-value threshold; default: q-value
  • -dist allr|ed|kullback|pearson|sandelin|blic1|blic5|llr1|llr5: Distance metric for scoring alignments.
    • allr: Average log-likelihood ratio. For non-zero probabilities
    • ed: Euclidian distance (default)
    • kullback: Kullback-Leibler divergence
    • pearson: Pearson's correlation coefficient
    • sandelin: Sandelin-Wasserman function
    • blic1: Bayesian Likelihood 2-Components (1 Dirichlet). DNA only
    • blic5: Bayesian Likelihood 2-Components (5 Dirichlet). DNA only
    • llr1: Log likelihood ratio (1 Dirichlet). DNA only
    • llr5: Log likelihood ratio (5 Dirichlet). DNA only
  • -internal: Only allow internal alignments; default: allow overhangs
  • -min-overlap <int>: Minimum overlap between query and target; default: 1
  • -norc: Do not score the reverse complements of targets
  • -incomplete-scores: Ignore unaligned columns in computing scores. default: use complete set of columns
  • -text: Output in text (TSV) format to stdout; overrides -o and -oc; default: output all formats to files in <output dir>
  • -png: Create PNG logos; default: don't create PNG logos
  • -eps: Create EPS logos; default: don't create EPS logos
  • -no-ssc: This option causes the LOGOs in the LOGO alignments output by Tomtom not to be corrected for small-sample sizes. By default, the height of letters in the LOGOs are reduced when the number of samples on which a motif is based (nsites in the MEME motif) is small. The default setting can cause motifs based on very few sites to have "empty" LOGOs, so this switch can be used if your query or target motifs are based on few samples. default: use small-sample correction
  • -time <time>: quit before <time> seconds elapsed. <time> must be > 0. The Default is unlimited elapsed time
  • -verbosity [1|2|3|4]: Set the verbosity of the program; default: 2 (normal)
  • -version: Print the version and exit

Example

Have a set of motifs but don't know what they are

The following command searches query motifs (query.meme) against known motifs provided by CIS-BP (CIS-BP_2.00/Homo_sapiens.meme). It uses pearson correlation (specified by -dist) as the scoring function and only report hits with significance scores higher than 0.1 (specified by -thresh). The output will be written into ./tomtom (specified by -oc), and if the folder exists, old contents will be replaced.

tomtom -dist pearson -thresh 0.1 -oc ./tomtom query.meme CIS-BP_2.00/Homo_sapiens.meme
DNA motif file

Below is an example DNA file.

MEME version 4

ALPHABET= ACGT

strands: + -

Background letter frequencies
A 0.303 C 0.183 G 0.209 T 0.306 

MOTIF crp
letter-probability matrix: alength= 4 w= 19 nsites= 17 E= 4.1e-009 
 0.000000  0.176471  0.000000  0.823529 
 0.000000  0.058824  0.647059  0.294118 
 0.000000  0.058824  0.000000  0.941176 
 0.176471  0.000000  0.764706  0.058824 
 0.823529  0.058824  0.000000  0.117647 
 0.294118  0.176471  0.176471  0.352941 
 0.294118  0.352941  0.235294  0.117647 
 0.117647  0.235294  0.352941  0.294118 
 0.529412  0.000000  0.176471  0.294118 
 0.058824  0.235294  0.588235  0.117647 
 0.176471  0.235294  0.294118  0.294118 
 0.000000  0.058824  0.117647  0.823529 
 0.058824  0.882353  0.000000  0.058824 
 0.764706  0.000000  0.176471  0.058824 
 0.058824  0.882353  0.000000  0.058824 
 0.823529  0.058824  0.058824  0.058824 
 0.176471  0.411765  0.058824  0.352941 
 0.411765  0.000000  0.000000  0.588235 
 0.352941  0.058824  0.000000  0.588235 

MOTIF lexA
letter-probability matrix: alength= 4 w= 18 nsites= 14 E= 3.2e-035 
 0.214286  0.000000  0.000000  0.785714 
 0.857143  0.000000  0.071429  0.071429 
 0.000000  1.000000  0.000000  0.000000 
 0.000000  0.000000  0.000000  1.000000 
 0.000000  0.000000  1.000000  0.000000 
 0.000000  0.000000  0.000000  1.000000 
 0.857143  0.000000  0.071429  0.071429 
 0.000000  0.071429  0.000000  0.928571 
 0.857143  0.000000  0.071429  0.071429 
 0.142857  0.000000  0.000000  0.857143 
 0.571429  0.071429  0.214286  0.142857 
 0.285714  0.285714  0.000000  0.428571 
 1.000000  0.000000  0.000000  0.000000 
 0.285714  0.214286  0.000000  0.500000 
 0.428571  0.500000  0.000000  0.071429 
 0.000000  1.000000  0.000000  0.000000 
 1.000000  0.000000  0.000000  0.000000 
 0.000000  0.000000  0.785714  0.214286 


Share your experience or ask a question