Slurm Usage¶
Slurm is used to submit jobs on the different partitions from the Monolithe frontend. The available partitions are listed in the description page (see the "Slurm Partition" column in the summary table).
Frontend Connection¶
It is recommended to add some lines to your ~/.ssh/config
file as explained
in the SSH access section. Then, to connect to the frontend
from your computer you only have to do:
Basic Slurm Commands¶
Here are some useful command to start using Slurm:
-
sinfo -l
lists the available partitions (= nodes in our case)$ sinfo -l Mon Mar 18 10:57:44 2024 PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST xu4 up infinite 1-infinite no NO all 1 idle vroum tx2 up infinite 1-infinite no NO all 1 idle tegrax2c xagx up infinite 1-infinite no NO all 1 idle tegraagx brub up infinite 1-infinite no NO all 1 idle brubeck xnano up infinite 1-infinite no NO all 1 idle jetson-nano1 rpi4 up infinite 1-infinite no NO all 1 idle selfix xnx up infinite 1-infinite no NO all 1 idle tegranx-1 m1u up infinite 1-infinite no NO all 1 idle m1ultra onx up infinite 1-infinite no NO all 1 idle orinnx oagx up infinite 1-infinite no NO all 1 mixed orinagx onano up infinite 1-infinite no NO all 1 idle orinnano opi5 up infinite 1-infinite no NO all 1 idle orangepi5
-
Submission of a job that execute thesrun -p [partition] command
runs a command on a partitionhostname
command on them1u
partition. -
Interactive job on the Orange Pi 5, all the 8 cores are used in this session (by default, ifsrun -p [partition] --pty bash -i
runs a interactive session on a partition--cpus-per-task
is not specified, only one core is allocated).Note
An easier way to connect interactively to the nodes is to use a custom
~/.ssh/config
file as detailed in the SSH Access page. -
sbatch [script]
runs a Slurm script on the cluster -
squeue -l
allows you to view current submitted jobs on the clusterFor instance, here one job from the$ squeue -l Mon Mar 18 10:58:20 2024 JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON) 1702 oagx bash galveze- RUNNING 47:10 UNLIMITED 1 orinagx
galveze
user is running on theoagx
partition. -
scancel [jobid]
cancels a job -
scancel -u [user]
cancels all the jobs for a given user