Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The FLARECAST node is the computing serveur named cluster-r730-1.u-psud.fr which is part of the IAS (PSUD) cluster. This cluster is managed by the SLURM system,  which should be used for launching any computing-intensive task (in particular a Docker container doing computations).

SLURM can be used as a queuing system, but at PSUD it is only used as a resource allocation system at the moment (as of 2016-04-22).

Table of Contents

Connection

Please use your PSUD login/password to connect to cluster-head.ias.u-psud.fr

Please note that a connection from out of PSUD requires using the ias-ssh.ias.u-psud.fr gateway.

Information about current resource usage

  • sinfo: information about SLURM queues
  • squeue: list of jobs in queues

The FLARECAST queue (or "partition" in SLURM) is called "flarecast", and corresponds to the FLARECAST node of the cluster (cluster-r730-1). Other queues (and nodes) can be used if requirements exceed the FLARECAST node availability, but these are shared with other projects, so please be considerate in your usage of other nodes. Some of these queues have nodes with GPGPUs (Nvidia K20) or Xeon Phi processors.

flarecast is a high-priority queue, meaning that jobs (from other projects) running on nodes including cluster-r730-1 will be suspended if you start a job in the flarecast queue.

Launching an interactive session within the SLURM system

The salloc command allocates resources within a queue and opens a shell from which commands using these resources can be launched with the srun command. This is mainly used for testing. Exiting the shell will cancel the resource allocation. If you want to keep the resource allocation (and job running) and logout from your terminal, you will need to run salloc within a screen command (then don't forget to cancel the resource allocation later!).

Code Block
languagebash
ebuchlin@cluster-head:~$ salloc -p flarecast -n

...

 2  # 2 jobs in partition "flarecast"
salloc: Granted job allocation 2765

...


ebuchlin@cluster-head:~$ srun hostname

...


cluster-r730-1

...


cluster-r730-1

...


ebuchlin@cluster-head:~$ exit

...


salloc: Relinquishing job allocation 2765

...


salloc: Job allocation 2765 has been revoked.
ebuchlin@cluster-head:~$

 

(hostname is used as an example, to show that it is run on the FLARECAST node; in practice you will use your own command: python script, mpirun, ...)

srun can also be used without salloc, but you then need to specify the SLURM options to each srun:

Code Block
languagebash
ebuchlin@cluster-head:~$ srun -p flarecast -n 2 hostname
cluster-r730-1
cluster-r730-1
ebuchlin@cluster-head:~$

 

Again, please use screen if you plan to logout after launching the job.

srun and salloc have different options for selecting a queue, the desired number of nodes or processes per node, etc.

Launching a batch job

A batch job can be launched using

Code Block
languagebash
ebuchlin@cluster-head:~$ sbatch script.sh

 

where script.sh is a shell script including (a) line(s) with #SBATCH followed by SLURM options.

For example, for 10 independent tasks, script.sh can be:

Code Block
languagebash
#!/bin/bash
#SBATCH -n 10 -p flarecast
cd some_directory
srun ./my_executable

 

For MPI parallization on 12 processors:

Code Block
languagebash
#!/bin/bash
#SBATCH --jobname my_job_name
#SBATCH -n 12
echo "$SLURM_NNODES nodes: $SLURM_NODELIST"
cd my_directory
mpirun ./my_executable

 

For an IDL job (using the full node, otherwise please change !cpu.tpool_nthreads):

Code Block
languagebash
#!/bin/bash
#SBATCH -N 1 -p flarecast
cat > idlscript.pro << EOF
my_idl_command1
my_idl_command2
EOF
idl idlscript.pro
rm idlscript.pro