CD-HIT-EST-2D compares 2 nucleotide datasets (db1, db2). It identifies the sequences in db2 that are similar to db1 at a certain threshold. The input are two DNA/RNA datasets (db1, db2) in fasta format and the output are two files: a fasta file of sequences in db2 that are not similar to db1 and a text file that lists similar sequences between db1 & db2. For same reason as CD-HIT-EST, CD-HIT-EST-2D is good for non-intron containing sequences like EST.
cd-hit-est-2d -i mrna_human -i2 est_human -o est_human_novel -c 0.95 -n 8
Choose of word size (same as CD-HIT-EST):
Options, -b, -M, -l, -d, -t, -s, -S, -s2, -S2, -B, -p, -aL, -AL, -aS, -AS, -g, -G, -T are same to CD-HIT-2d, here are few more cd-hit-est-2d specific options: