Reference Code backup Executable files
Tomtom compares one or more motifs against a database of known motifs (e.g., JASPAR). Tomtom will rank the motifs in the database and produce an alignment for each significant match (sample output for motif and JASPAR CORE 2014 database).
tomtom [options] <query file> <target file>+
Tomtom
searches one or more query motifs against one or more databases of target motifs (and their reverse complements when applicable), and reports for each query a list of target motifs, ranked by p-value. The E-value and the q-value of each match is also reported. The q-value is the minimal false discovery rate at which the observed similarity would be deemed significant. The output contains results for each query, in the order that the queries appear in the input file.
For a given pair of motifs, the program considers all offsets between the motifs, while requiring a minimum number of overlapping positions. For a given offset, each overlapping position is scored using one of seven column similarity functions defined below. Columns in the query motif that don't overlap the target motif are assigned a score equal to the median score of the set of random matches to that column.
In order to compute the scores, Tomtom
needs to know the frequencies of the letters of the sequence alphabet in the database being searched (the 'background' letter frequencies). By default, the background letter frequencies included in the query motif file are used. The scores of columns that overlap for a given offset are summed. This summed score is then converted to a p-value. The reported p-value is the minimal p-value over all possible offsets. To compensate for multiple testing, each reported p-value is converted to an E-value by multiplying it by twice the number of target motifs. As a second type of multiple-testing correction, q-values for each match are computed from the set of p-values and reported.
The following command searches query motifs (query.meme) against known motifs provided by CIS-BP (CIS-BP_2.00/Homo_sapiens.meme). It uses pearson correlation (specified by -dist) as the scoring function and only report hits with significance scores higher than 0.1 (specified by -thresh). The output will be written into ./tomtom (specified by -oc), and if the folder exists, old contents will be replaced.
tomtom-dist pearson
-thresh 0.1
-oc ./tomtom
query.meme CIS-BP_2.00/Homo_sapiens.meme
Below is an example DNA file.
MEME version 4 ALPHABET= ACGT strands: + - Background letter frequencies A 0.303 C 0.183 G 0.209 T 0.306 MOTIF crp letter-probability matrix: alength= 4 w= 19 nsites= 17 E= 4.1e-009 0.000000 0.176471 0.000000 0.823529 0.000000 0.058824 0.647059 0.294118 0.000000 0.058824 0.000000 0.941176 0.176471 0.000000 0.764706 0.058824 0.823529 0.058824 0.000000 0.117647 0.294118 0.176471 0.176471 0.352941 0.294118 0.352941 0.235294 0.117647 0.117647 0.235294 0.352941 0.294118 0.529412 0.000000 0.176471 0.294118 0.058824 0.235294 0.588235 0.117647 0.176471 0.235294 0.294118 0.294118 0.000000 0.058824 0.117647 0.823529 0.058824 0.882353 0.000000 0.058824 0.764706 0.000000 0.176471 0.058824 0.058824 0.882353 0.000000 0.058824 0.823529 0.058824 0.058824 0.058824 0.176471 0.411765 0.058824 0.352941 0.411765 0.000000 0.000000 0.588235 0.352941 0.058824 0.000000 0.588235 MOTIF lexA letter-probability matrix: alength= 4 w= 18 nsites= 14 E= 3.2e-035 0.214286 0.000000 0.000000 0.785714 0.857143 0.000000 0.071429 0.071429 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.857143 0.000000 0.071429 0.071429 0.000000 0.071429 0.000000 0.928571 0.857143 0.000000 0.071429 0.071429 0.142857 0.000000 0.000000 0.857143 0.571429 0.071429 0.214286 0.142857 0.285714 0.285714 0.000000 0.428571 1.000000 0.000000 0.000000 0.000000 0.285714 0.214286 0.000000 0.500000 0.428571 0.500000 0.000000 0.071429 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.785714 0.214286