Using MPI-HMMER on Big Red at IU
On this page:
Introduction
HMMER is a suite of programs that you can use to create and query
hidden Markov models that describe molecular sequences. A parallel
port of HMMER known as MPI-HMMER is available on Big Red
at Indiana University. It contains all the programs of HMMER, but only
hmmpfam and hmmsearch have been
parallelized.
MPI-HMMER is installed in the directory
/N/soft/linux-sles9-ppc64/hmmer-2.3.2-MPI-0.9. Documentation for
HMMER programs is available as man pages. You can also
visit the MPI-HMMER page for more
information.
Parallel hmmpfam and hmmsearch
You can run hmmpfam and hmmsearch using the
hmmerjob script. Use the hmmpfam and
hmmsearch options with hmmerjob just as you
would with serial versions of these programs. If you use only the
hmmpfam and hmmsearch options, a job will be
submitted that uses four processes for up to two hours in the medium
(MED) queue on Big Red. You can use other options to change those
settings.
The form of the hmmerjob command using hmmpfam is:
For hmmsearch, it is:
Replace items in brackets with your chosen values. The
-CPUS option specifies the number of processes to start,
-wallhours the length of time that the job may run, and
-queue the name of the queue that is to receive the
job. In the default queue (MED), you can request up to 128 processes
for up to 336 hours (14 days). The BIG queue allows jobs of up to
1,024 processes for up to 120 hours (5 days). The FAST queue is available for
debugging; it allows jobs of up to 16 processes for up to 2 hours.
For example, suppose you would like to compare all the sequences in a
file named unknowns.fa with all the models in
models.hmm and select matches that have an E score of 1
or better, using 4 processes for up to 2 hours. The command would be:
To run the same job using 64 processes for up to 72 hours, you would use:
hmmerjob hmmpfam -E 1 models.hmm unknowns.fa -CPUS 64 -wallhours 72To run a simple hmmsearch with models in
models.hmm and sequences in experiment56.fa
in the BIG queue using 512 processes for 8 hours, the command is:
When you run hmmerjob, you'll receive a message that your
job has been submitted to the queue. You will receive mail when the
job finishes. You can check the status of your job by using the
llq command.
Output from the job is stored in a file with a name of the form
hmmerjob.999999.0.out, where the nines are replaced by
some other digits that represent the job ID. Errors and debugging
output are stored in a separate file with a name of the form
hmmerjob.999999.err.
Using non-parallel HMMER programs
The serial (single-process) programs of HMMER are also available on
Big Red. The simplest way to use them is to put them on your path by using the
+mpi-hmmer SoftEnv key. To permanently make
HMMER available at the command prompt, run the commands:
You should then be able to run serial HMMER programs, and all HMMER manual pages should be available to you. If you need to run serial HMMER programs in batch jobs, the simplest way to do so is to use the serialjob script. A manual page for it is available on Big Red.
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Also see:
Last modified on August 07, 2008.






