Access to the DiRAC COSMA5 machine is provided using the three queues cosma5, cosma5-pauper and cosma5-prince. These three queues share all the 420 compute nodes (includes the 30 bought by Durham) dispatching jobs according to the different priorities that the queues have assigned. All the queues are configured so that job exclusive access to nodes is enforced. This means that no jobs share a compute node.
The main use of these queues is for DiRAC projects that have been assigned time at Durham and the mix of jobs expected to match its capabilities are MPI/OpenMP/Hybrid jobs using up to 16 cores per node with a maximum memory of 126Gb per node. Note that the nodes in COSMA5 are disk-less so this represents a hard memory limit and exceeding this will cause your job to fail. If any jobs can run using fewer resources than a single node then they should be packaged into a batch job with appropriate internal process control to scale up to this level.
In addition to the hardware limits the queues have the following limits and priorities:
|Name||priority||maximum run time||maximum cores|
The three queues share the same resources so the order that jobs run are decided on a number of factors. Higher priority jobs will run first, and in fact jobs in higher priority queues will always run before lower priority jobs, however, it may not superficially seem like that as jobs from lower priority queues may run as back-fills (this is allowed when a lower priority job will complete before the resources needed for a higher one will become available, so setting a run-time limit for your job may get it completed more quickly). See the Durham utilities descriptions for how to make use of back-filling.
The cosma5 and cosma5-pauper queues are available to all users, the -prince queue can only be accessed by special arrangement. Projects that overrun their quarterly allocation will only be allowed access to the cosma5-pauper queue, until the start of the next quarter, and any jobs submitted to cosma5 will be automatically demoted (this is so that the resources are made available to the other projects, not to stop over-budget projects from continuing if resources are available).
The quarterly allocation and how much use each project has made of theirs, can can be found out in the:
pages, you'll need your COSMA username and password to see these.
Jobs within the same queue are scheduled using a fairshare arrangement so each user initially has the same priority. This is then weighted using a resources used formula, the current values of which can be seen using the:
bqueues -l cosma5
command in the PRIORITY column. Users not shown have a priority of 1/3. Note that the order of running can again be affected by back-filling (but that will only work if a job is given a run time) and using fewer resources than other jobs.