Category

Sequence Analysis


Usage

cd-hit-2d -i db1 -i2 db2 -o db2novel -c 0.9 -n 5


Manual

where
db1 & db2 are inputs,
db2novel is output,
0.9 means 90% identity, is the comparing threshold
5 is the size of word

Options, -b, -M, -l, -d, -t, -s, -S, -B, -p, -aL, -AL, -aS, -AS, -g, -G, -T are same to CD-HIT, here are few more cd-hit-2d specific options:

-i2 input filename for db2 in fasta format, required
-s2 length difference cutoff for db1, default 1.0
    by default, seqs in db1 >= seqs in db2 in a same cluster
    if set to 0.9, seqs in db1 may just >= 90% seqs in db2
-S2 length difference cutoff, default 0
    by default, seqs in db1 >= seqs in db2 in a same cluster
    if set to 60, seqs in db2 may 60aa longer than seqs in db1


Share your experience or ask a question