Using PLATFORM LSF at Durham

Submitting a batch job

In LSF you submit a batch job using the bsub command:

    % bsub my_program
    

Which would submit a job to run my_program in the default queue. This command would not work in Durham as there is no enabled default queue as all jobs must be explicitly assigned to one of the queues and also given a project code to charge the time against. DiRAC users should submit jobs to cosma5 and charge to their project or sub-project. So the minimum command is in fact something like:

    % bsub -P dp004 -q cosma5 my_program
    

Users of COSMA4 should charge to the durham project:

    % bsub -P durham -q cosma my_program
    

The simplest way to handle all the options you generally need to use is by creating a batch submission script, this is submitted using:

    % bsub < batchscript
    

where batchscript is a text file of commands you want to execute. This can be written using C-Shell or bash syntax. Notice the < sign which directs bsub to read the file rather than just execute it. This is equivalent to:

    % cat batchscript | bsub
    

Job options

There are many options for bsub command allowing you to specify things like the number of the number of processors that a job requires. These options can be specified on the command-line as shown above (see the bsub man page for the very long list) or more conveniently as macro-like values in the preamble of a batch submission script. For bsub these are lines starting with:

    #BSUB
    

Commonly used options under LSF are:

    #BSUB -L /bin/sh             # script shell language (/bin/tcsh, /bin/ksh etc.)
    #BSUB -n ncores              # number of processors required
    #BSUB -J myjob               # name of job
    #BSUB -o myjob.out           # log file for standard output (appends)
    #BSUB -e myjob.err           # log file for standard error (appends)
    #BSUB -oo myjob.log          # log file for standard output (overwrites)
    #BSUB -eo myjob.err          # log file for standard error (overwrites)
    #BSUB -q queue               # target queue for job execution
    #BSUB -W HH:MM               # wall clock limit for job
    #BSUB -u user@durham         # email address for output
    #BSUB -N                     # send email even if writing a log file
    #BSUB -P project             # project to charge time
    #BSUB -x                     # give node exclusive access to job
    #BSUB -R "span[ptile=n]"     # number of processors to use per node
    #BSUB -M mem_limit           # upper memory limit in Mb
    

Note that without -e or -eo standard output and error will be written to the same log file.

Example submission scripts can be found in the /cosma/home/sample-user directory.

Controlling jobs

You can see which jobs you have queued or have running using the command:

    % bjobs
    

If you want to see the jobs running for all users, then use:

    % bjobs -u all
    

The nodes in use, reasons why a job is pending etc. are reported if you add the -l option flag. You can see all the jobs that you have ran recently using the -a option flag.

To delete a job use the:

    % bkill <jobid>
    

command. The job id is reported by bjobs command. Without a jobid your last job is killed, or much more carefully you can use 0 to kill all your jobs.

The queues that are available for submission are reported by the:

    % bqueues [-l]
    

command, although don't use any other than the ones mentioned in these pages.

To see the output from a running job you can use the:

      bpeek
    

command, or just look in the output logs - these should be updated as the job progresses.

Useful commands available only in Durham (not part of the official LSF) are described in their own page:

These are very useful for getting a condensed view of the queues and seeing what resources are available and how to make use of them.

Useful environmental variables

When a job is running the following environment variables are defined.

    LSB_ERRORFILE:    Name of the error file
    LSB_JOBID:        Batch job ID assigned by LSF.
    LSB_JOBINDEX:     Index of the job that belongs to a job array.
    LSB_HOSTS:        The list of hosts that are used to run the batch job.
    LSB_QUEUE:        The name of the queue the job is dispatched from.
    LSB_JOBNAME:      Name of the job.
    LS_SUBCWD:        The directory where the job was submitted.
    LSB_DJOB_NUMPROC: The number of cores allocated to the job.
    

and can be used in your submission scripts etc.

Array jobs

Array jobs are created by submitting a job whose name is suffixed by a range. A simple example is:

    % bsub -J "myjob[1-4]" myscript
    

That will create four different jobs with the same jobid, but different job indices. Any other job options, like the number of nodes etc., should also be entered on the command-line. In this case myscript will need to be executable and available on the $PATH (or you should replace myscript with the full path name).

The index of a running job is determined using the:

    LSB_JOBINDEX
    

environment variable. Job arrays can also be specified in the preamble of an submission script:

    #BSUB -J myjob[1-4]
    

To increment the index using a step greater than 1, you use the syntax:

    #BSUB -J myjob[1-10:2]
    

It is also possible to have discontiguous sequences of job indices:

    #BSUB -J myjob[1-10:2,20-30:3,40,50]
    

The job array index can also be used to change the name of the output files using the special character %I.

    #BSUB -J myjob[1-4]
    #BSUB -o myjob%I.log
    

Passing command-line values to jobs

The LSF method for passing arguments is to not use an submission script, but use a normal command instead. For instance if I have a runnable script myscript.sh present on my $PATH, I could run it using:

    % bsub -n 4 -oo myjob.log myscript.sh arg1 arg2
    

and then arg1 and arg2 will be available as $1 and $2 in myscript.sh, the same would be true if this was a C-shell script (again note you cannot use #BSUB options in the preamble of such scripts, all options will need to be put on the command-line). If the command is not on your $PATH then you would need to supply the full path name.

Another way to pass in arguments is to edit an submission script on-the-fly. That is easily done using sed:

    % sed 's/%args%/arg1 arg2/g' < batchfile | bsub
    

That would replace all occurrences of the string %args% with arg1 arg2, which could be handled in the following manner in a C-shell submission script:

    #  Edited args:
    set args = (%args%)
    echo "firstvalue = $args[1]"
    echo "secondvalue = $args[2]"
    

In bash that would be:

    args=(%args%)
    echo "firstvalue = ${args[0]}"
    echo "secondvalue = ${args[1]}"
    

For standard Bourne shell you'd need to use something like:

    args="%args%"
    arg1=`echo $args | awk '{print $1}'`
    arg2=`echo $args | awk '{print $2}'`
    echo "firstvalue = $arg1"
    echo "secondvalue = $arg2"
    

Using modules with bash submission scripts

When LSF accepts a batch job it attempts to retain the environment of the host process (i.e. the terminal you are typing in), which means that bash jobs will not source any of their initialisation scripts. This is necessary to define the module command (for tcsh scripts the .cshrc/.tcshrc file is always read so that works). To work around this make sure your script is ran either as an interactive or login shell. To do this add the "-l" flag to the hash-bang:

     #!/bin/bash -l
    

You could use "-i" if your .bashrc script sets up the module command.

Submitting jobs to specific architectures

If you have optimised your code during compilation for a specific architecture then you will need to make sure that it is ran on the appropriate machines. Note this should only be an issue for the shm4 queue, the all the other queues only use nodes of the same type.

This is simply done by using the model resource requirement, which is defined as one of SandyBridge, Westmere or Opteron.

     bsub -R "model=SandyBridge"
     bsub -R "model=Westmere"
     bsub -R "model=Opteron"
    

or:

     #BSUB -R "model=SandyBridge"
     #BSUB -R "model=Westmere"
     #BSUB -R "model=Opteron"
    

from within a submission script. All of COSMA5 is SandyBridge and all of COSMA4 is Westmere, except for the leda node which is Opteron.

Submitting GPU jobs

The COSMA4 nodes m4245 through m4248 each have two NVIDIA Tesla M2090 GPU cards that can be used from the cordelia queue. To submit to these you need to use the gpu attributes as constraints. Here's an example from the command-line running the command nvidia-smi:

     bsub -n 1 -P durham -q cordelia -R 'ngpus>0' "nvidia-smi"
    

To get exclusive access to the GPUs on a machine you need to use:

     bsub -n 1 -P durham -q cordelia -R 'ngpus>0' -a gpuexclusive "nvidia-smi"
    

You can check the availability of time on the GPU hosting nodes using:

     lsload -R 'ngpus>0'
    

Naturally the bsub options can also be put into a submission script.