132 lines
4.9 KiB
Plaintext
132 lines
4.9 KiB
Plaintext
This file contains information concerning the use of the new job array
|
|
features in TORQUE 2.5.
|
|
|
|
--- WARNING ---
|
|
TORQUE 2.5 uses a new format for job arrays. It in not backwards
|
|
compatible with job arrays from version 2.3 or 2.4. Therefore, it is
|
|
imperative that the system be drained of any job arrays BEFORE upgrading.
|
|
Upgrading with job arrays queued or running may cause data loss, crashes,
|
|
etc, and is not supported.
|
|
|
|
|
|
COMMAND UPDATES FOR ARRAYS
|
|
--------------------------
|
|
|
|
The commands qalter, qdel, qhold, and qrls now all support TORQUE
|
|
arrays and will have to be updated. The general command syntax is:
|
|
|
|
command <array_name> [-t array_range] [other command options]
|
|
|
|
The array ranges accepted by -t here are exactly the same as the array ranges
|
|
that can be specified in qsub.
|
|
(http://www.clusterresources.com/products/torque/docs/commands/qsub.shtml)
|
|
|
|
|
|
SLOT LIMITS
|
|
--------------------------
|
|
|
|
It is now possible to limit the number of jobs that can run concurrently in a
|
|
job array. This is called a slot limit, and the default is unlimited. The slot
|
|
limit can be set in two ways.
|
|
|
|
The first method can be done at job submission:
|
|
|
|
qsub script.sh -t 0-299%5
|
|
|
|
This sets the slot limit to 5, meaning only 5 jobs from this array can be
|
|
running at the same time.
|
|
|
|
The second method can be done on a server wide basis using the server
|
|
parameter max_slot_limit. Since administrators are more likely to
|
|
be concerned with limiting arrays than users in many cases the
|
|
max_slot_limit parameter is a convenient way to set a global policy.
|
|
If max_slot_limit is not set then the default limit is unlimited. To set
|
|
max_slot_limit you can use the following queue manager command.
|
|
|
|
qmgr -c 'set server max_slot_limit=10'
|
|
|
|
This means that no array can request a slot limit greater than 10, and
|
|
any array not requesting a slot limit will receive a slot limit of 10.
|
|
If a user requests a slot limit greater than 10, the job will be
|
|
rejected with the message:
|
|
|
|
Requested slot limit is too large, limit is X. In this case, X would be 10.
|
|
|
|
It is recommended that if you are using torque with a scheduler like Moab or Maui
|
|
that you also set the server parameter moab_array_compatible=true. Setting
|
|
moab_array_compatible will put all jobs over the slot limit on hold
|
|
so the scheduler will not try and schedule jobs above the slot limit.
|
|
|
|
|
|
JOB ARRAY DEPENDENCIES
|
|
--------------------------
|
|
|
|
The following dependencies can now be used for job arrays:
|
|
|
|
afterstartarray
|
|
afterokarray
|
|
afternotokarray
|
|
afteranyarray
|
|
beforestartarray
|
|
beforeokarray
|
|
beforenotokarray
|
|
beforeanyarray
|
|
|
|
The general syntax is:
|
|
|
|
qsub script.sh -W depend=dependtype:array_name[num_jobs]
|
|
|
|
The suffix [num_jobs] should appear exactly as above, although the number of
|
|
jobs is optional. If it isn't specified, the dependency will assume that it is
|
|
the entire array, for example:
|
|
|
|
qsub script.sh -W depend=afterokarray:427[]
|
|
|
|
will assume every job in array 427[] has to finish successfully for the
|
|
dependency to be satisfied. The submission:
|
|
|
|
qsub script.sh -W depend=afterokarray:427[][5]
|
|
|
|
means that 5 of the jobs in array 427 have to successfully finish in order for
|
|
the dependency to be satisfied.
|
|
|
|
NOTE: It is important to remember that the "[]" is part of the array name.
|
|
|
|
|
|
QSTAT FOR JOB ARRAYS
|
|
--------------------------
|
|
|
|
Normal qstat output will display a summary of the array instead of displaying
|
|
the entire array, job for job.
|
|
|
|
qstat -t will expand the output to display the entire array.
|
|
|
|
|
|
ARRAY NAMING CONVENTION
|
|
--------------------------
|
|
|
|
Arrays are now named with brackets following the array name, for example:
|
|
|
|
dbeer@napali:~/dev/torque/array_changes$ echo sleep 20 | qsub -t 0-299
|
|
189[].napali
|
|
|
|
Individual jobs in the array are now also noted using square brackets instead of
|
|
dashes, for example, here is part of the output of qstat -t for the above
|
|
array:
|
|
|
|
189[287].napali STDIN[287] dbeer 0 Q batch
|
|
189[288].napali STDIN[288] dbeer 0 Q batch
|
|
189[289].napali STDIN[289] dbeer 0 Q batch
|
|
189[290].napali STDIN[290] dbeer 0 Q batch
|
|
189[291].napali STDIN[291] dbeer 0 Q batch
|
|
189[292].napali STDIN[292] dbeer 0 Q batch
|
|
189[293].napali STDIN[293] dbeer 0 Q batch
|
|
189[294].napali STDIN[294] dbeer 0 Q batch
|
|
189[295].napali STDIN[295] dbeer 0 Q batch
|
|
189[296].napali STDIN[296] dbeer 0 Q batch
|
|
189[297].napali STDIN[297] dbeer 0 Q batch
|
|
189[298].napali STDIN[298] dbeer 0 Q batch
|
|
189[299].napali STDIN[299] dbeer 0 Q batch
|
|
|
|
|