Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

SLURM can be used as a queuing system, but at PSUD it is only used as a resource allocation system at the moment (as of 2016-04-22).

Table of Contents

Connection

Please use your PSUD login/password to connect to cluster-head.ias.u-psud.fr

Please note that a connection from out of PSUD requires using the ias-ssh.ias.u-psud.fr gateway.

Information about current resource usage

  • sinfo: information about SLURM queues
  • squeue: list of jobs in queues

...

flarecast is a high-priority queue, meaning that jobs (from other projects) running on nodes including cluster-r730-1 will be suspended if you start a job in the flarecast queue.

Launching an interactive session within the SLURM system

The salloc command allocates resources within a queue and opens a shell from which commands using these resources can be launched with the srun command. This is mainly used for testing. Exiting the shell will cancel the resource allocation. If you want to keep the resource allocation (and job running) and logout from your terminal, you will need to run salloc within a screen command (then don't forget to cancel the resource allocation later!).

Code Block
languagebash
ebuchlin@cluster-head:~$ salloc -p flarecast -n 2  # 2 jobs in partition "flarecast"

...


salloc: Granted job allocation 2765

...


ebuchlin@cluster-head:~$ srun hostname

...


cluster-r730-1

...


cluster-r730-1

...


ebuchlin@cluster-head:~$ exit

...


salloc: Relinquishing job allocation 2765

...


salloc: Job allocation 2765 has been revoked.

...


ebuchlin@cluster-head:~$

 

(hostname is used as an example, to show that it is run on the FLARECAST node; in practice you will use your own command: python script, mpirun, ...)

srun can also be used without salloc, but you then need to specify the SLURM options to each srun:

Code Block
languagebash
ebuchlin@cluster-head:~$ srun -p flarecast -n 2 hostname

...


cluster-r730-1

...


cluster-r730-1

...


ebuchlin@cluster-head:~$

 

Again, please use screen if you plan to logout after launching the job.

srun and salloc have different options for selecting a queue, the desired number of nodes or processes per node, etc.

Launching a batch job

A batch job can be launched using

Code Block
languagebash
ebuchlin@cluster-head:~$ sbatch script.sh

 where  

where script.sh is a shell script including (a) line(s) with #SBATCH followed by SLURM options.

For example, for 10 independent tasks, script.sh can be:

Code Block
languagebash
#!/bin/bash

...


#SBATCH -n 10 -p flarecast

...


cd some_directory

...


srun ./my_executable

 

For MPI parallization on 12 processors:

Code Block
languagebash
#!/bin/bash

...


#SBATCH --jobname my_job_name

...


#SBATCH -n 12

...


echo "$SLURM_NNODES nodes: $SLURM_NODELIST"

...


cd my_directory

...


mpirun ./my_executable

 

For an IDL job (using the full node, otherwise please change !cpu.tpool_nthreads):

Code Block
languagebash
#!/bin/bash

...


#SBATCH -N 1 -p flarecast

...


cat > idlscript.pro << EOF

...


my_idl_command1

...


my_idl_command2

...


EOF

...


idl idlscript.pro

...


rm idlscript.pro