The Durham COSMA machines, COSMA4 and COSMA5, have an integrated set of batch queues that are managed by PLATFORM LSF (Load Sharing Facility) job scheduler. Jobs are ran by the scheduler after submission to the appropriate queue. Which queue to submit your jobs to depends on the projects you are working on, if in doubt about this ask your project lead, supervisor, co-workers or email firstname.lastname@example.org.
Currently we have three main queues:
These will be used by the majority of jobs and all users will have the
rights to use at least one of these queues. Access rights are
controlled by membership of projects, which are the same as UNIX
groups. COSMA5 projects have a DiRAC assigned group code (usually
dp followed by three integers) together
with a quarterly allocation of time. COSMA4 users just need to be in
durham project (all Virgo consortium members and
Durham locals should be in this group). You can check which projects
you are in using the command
id, which lists the UNIX
groups you are a member of. A more authoritative list of group members
known to the batch system can be found using the command:
In addition to the three main queues we also have:
The -prince queues are only available on request for jobs that cannot run on the cosma5 or cosma queues. Usually this means that they require more time than the run-time limit on cosma5 or cosma and cannot, for technical reasons, be restarted, or restarting them is inefficient (usually very large jobs expected to use a lot of run-time, this is inefficient as they continually need to be fitted back into the machine, holding nodes idle in the process).
The cosma5-pauper queue is used to reduce the priority of COSMA5 projects that have exceeded their quarterly allocation. It has a reduced run-time limit as well as priority, this allows other projects to preferentially get time without stopping progress on over budget projects.
The shm4 and shm5 queues are available for large shared memory jobs that cannot run on a COSMA4 or COSMA5 compute nodes. These share resources with the interactive login machines.
See the following links for details about the various queues. Dirac users just need to read about the COSMA5 family.
PLATFORM LSF has a number of man pages available on the system, these should be consulted for detailed information about any commands, but a useful overview of LSF commands and working practices for COSMA is available.
There are also a number of locally developed LSF and related commands available:
These concentrate on providing more condensed information than can be easily extracted from the standard commands. They also help you make more effective use of the queues.