CD-HIT-PARA is a script that runs cd-hit, cd-hit-est in a parallel mode. It splits the input database; runs cd-hit or cd-hit-est in parallel on a computer cluster; and finally merges the outputs into a single file. You can run it as you run cd-hit or cd-hit-est. The input is a protein or DNA/RNA dataset in fasta format and the output are two files: a fasta file of representative sequences and a text file of list of clusters.
cd-hit-para.pl -i nr90 -o nr60 -c 0.6 -n 4 --B hosts --S 64
where
--B
hosts is a file with available hostnames
--S
64 is the number to split input db into, this number should be several times the number of hosts
More options: