Indiana University
University Information Technology Services
  
What are archived documents?

Using MPI-HMMER on Big Red at IU

On this page:


Introduction

HMMER is a suite of programs that you can use to create and query hidden Markov models that describe molecular sequences. A parallel port of HMMER known as MPI-HMMER is available on Big Red at Indiana University. It contains all the programs of HMMER, but only hmmpfam and hmmsearch have been parallelized.

MPI-HMMER is installed in the directory /N/soft/linux-sles9-ppc64/hmmer-2.3.2-MPI-0.9. Documentation for HMMER programs is available as man pages. You can also visit the MPI-HMMER page for more information.

Parallel hmmpfam and hmmsearch

You can run hmmpfam and hmmsearch using the hmmerjob script. Use the hmmpfam and hmmsearch options with hmmerjob just as you would with serial versions of these programs. If you use only the hmmpfam and hmmsearch options, a job will be submitted that uses four processes for up to two hours in the medium (MED) queue on Big Red. You can use other options to change those settings.

The form of the hmmerjob command using hmmpfam is:

hmmerjob hmmpfam options_to_hmmpfam -CPUS <count> -wallhours <n> -queue <queue_name>

For hmmsearch, it is:

hmmerjob hmmsearch options_to_hmmsearch -CPUS <count> -wallhours <n> -queue <queue_name>

Replace items in brackets with your chosen values. The -CPUS option specifies the number of processes to start, -wallhours the length of time that the job may run, and -queue the name of the queue that is to receive the job. In the default queue (MED), you can request up to 128 processes for up to 336 hours (14 days). The BIG queue allows jobs of up to 1,024 processes for up to 120 hours (5 days). The FAST queue is available for debugging; it allows jobs of up to 16 processes for up to 2 hours.

For example, suppose you would like to compare all the sequences in a file named unknowns.fa with all the models in models.hmm and select matches that have an E score of 1 or better, using 4 processes for up to 2 hours. The command would be:

hmmerjob hmmpfam -E 1 models.hmm unknowns.fa

To run the same job using 64 processes for up to 72 hours, you would use:

hmmerjob hmmpfam -E 1 models.hmm unknowns.fa -CPUS 64 -wallhours 72

To run a simple hmmsearch with models in models.hmm and sequences in experiment56.fa in the BIG queue using 512 processes for 8 hours, the command is:

hmmerjob hmmsearch models.hmm experiment56.fa -CPUS 512 -wallhours 8 -queue BIG

When you run hmmerjob, you'll receive a message that your job has been submitted to the queue. You will receive mail when the job finishes. You can check the status of your job by using the llq command.

Output from the job is stored in a file with a name of the form hmmerjob.999999.0.out, where the nines are replaced by some other digits that represent the job ID. Errors and debugging output are stored in a separate file with a name of the form hmmerjob.999999.err.

Using non-parallel HMMER programs

The serial (single-process) programs of HMMER are also available on Big Red. The simplest way to use them is to put them on your path by using the +mpi-hmmer SoftEnv key. To permanently make HMMER available at the command prompt, run the commands:

echo +mpi-hmmer >> ~/.soft resoft

You should then be able to run serial HMMER programs, and all HMMER manual pages should be available to you. If you need to run serial HMMER programs in batch jobs, the simplest way to do so is to use the serialjob script. A manual page for it is available on Big Red.

This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

Also see:

This is document awwb in domains all and tgrid-all.
Last modified on August 07, 2008.
Please tell us, did you find the answer to your question?