PSI-CD-HIT clusters proteins into clusters that meet a user-defined similarity threshold, which can be identity or expect value. Each cluster has one representative sequence. The input is a protein dataset in fasta format and the outputs are two files: a fasta file of representative sequences and a text file of list of clusters
psi-cd-hit.pl -i nr60 -o nr30 -c 0.3
Options, -l, -d, -s, -S are same to CD-HIT, here are few more psi-cd-hit specific options: