The FLARECAST node is the computing serveur named cluster-r730-1.u-psud.fr
which is part of the IAS (PSUD) cluster. This cluster is managed by the SLURM system, which should be used for launching any computing-intensive task (in particular a Docker container doing computations).
SLURM can be used as a queuing system, but at PSUD it is only used as a resource allocation system at the moment (as of 2016-04-22).
Please use your PSUD login/password to connect to cluster-head.ias.u-psud.fr
Please note that a connection from out of PSUD requires using the ias-ssh.ias.u-psud.fr
gateway.
sinfo
: information about SLURM queuessqueue
: list of jobs in queuesThe FLARECAST queue (or "partition" in SLURM) is called flarecast
, and corresponds to the FLARECAST node of the cluster (cluster-r730-1
). Other queues (and nodes) can be used if requirements exceed the FLARECAST node availability, but these are shared with other projects, so please be considerate in your usage of other nodes. Some of these queues have nodes with GPGPUs (Nvidia K20) or Xeon Phi processors.
flarecast
is a high-priority queue, meaning that jobs (from other projects) running on nodes including cluster-r730-1
will be suspended if you start a job in the flarecast
queue.
The salloc
command allocates resources within a queue and opens a shell from which commands using these resources can be launched with the srun
command. This is mainly used for testing. Exiting the shell will cancel the resource allocation. If you want to keep the resource allocation (and job running) and logout from your terminal, you will need to run salloc
within a screen
command (then don't forget to cancel the resource allocation later!).
ebuchlin@cluster-head:~$ salloc -p flarecast -n 2 # 2 jobs in partition "flarecast" salloc: Granted job allocation 2765 ebuchlin@cluster-head:~$ srun hostname cluster-r730-1 cluster-r730-1 ebuchlin@cluster-head:~$ exit salloc: Relinquishing job allocation 2765 salloc: Job allocation 2765 has been revoked. ebuchlin@cluster-head:~$ |
(hostname is used as an example, to show that it is run on the FLARECAST node; in practice you will use your own command: python script, mpirun, ...)
srun can also be used without salloc, but you then need to specify the SLURM options to each srun:
ebuchlin@cluster-head:~$ srun -p flarecast -n 2 hostname cluster-r730-1 cluster-r730-1 ebuchlin@cluster-head:~$ |
Again, please use screen
if you plan to logout after launching the job.
srun and salloc have different options for selecting a queue, the desired number of nodes or processes per node, etc.
A batch job can be launched using
ebuchlin@cluster-head:~$ sbatch script.sh |
where script.sh
is a shell script including (a) line(s) with #SBATCH followed by SLURM options.
For example, for 10 independent tasks, script.sh
can be:
#!/bin/bash #SBATCH -n 10 -p flarecast cd some_directory srun ./my_executable |
For MPI parallization on 12 processors:
#!/bin/bash #SBATCH --jobname my_job_name #SBATCH -n 12 echo "$SLURM_NNODES nodes: $SLURM_NODELIST" cd my_directory mpirun ./my_executable |
For an IDL job (using the full node, otherwise please change !cpu.tpool_nthreads
):
#!/bin/bash #SBATCH -N 1 -p flarecast cat > idlscript.pro << EOF my_idl_command1 my_idl_command2 EOF idl idlscript.pro rm idlscript.pro |