ssh <netid>@login.isaac.utk.edu
Or
<oit.utk.edu/hpsc> > Open OnDemand > Clusters > ISAAC NG
cd $SCRATCHDIR/projects/example
We're going to write scripts to run fastqc ... we need some data.
mkdir -p data/raw
cd data/raw
wget <some data>
Get an interactive compute session. Best for testing.
salloc [...]
nano qc.sh
#!/usr/bin/env bash
module load fastqc
fastqc ./data/*.fastq.gz -o ./fastqc
Don't do this on a log in node
bash qc.sh
#!/usr/bin/env bash
#SBATCH <account,partition,qos>
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
module load fastqc
fastqc ./data/*.fastq.gz -o ./fastqc
Submit that script.
sbatch qc.sh
#SBATCH <account,partition,qos>
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
<your Python or R script> <args>
#SBATCH --cpus-per-task 1
#SBATCH --cpus-per-task 8
Default memory per CPU is usually ~3.8 GB, so you may want to request more CPUs
(e.g., increase --cpus-per-task
) or set --mem
to request more memory.
#SBATCH <account,partition,qos>
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 8
module load fastqc
echo "SLURM_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK"
fastqc ./data/*.fastq.gz -o ./fastqc --threads "$SLURM_CPUS_PER_TASK"
Alternatively, we can run fastqc in parallel using --ntasks
and srun
.
Why? Fastqc can already run multi-threaded ....
If you have a lot of fastq files, the bottleneck for multi-threading may become max CPUs.
Run fastqc
up to 8 times, independently, in parallel:
#SBATCH <account,partition,qos>
#SBATCH --nodes 1 # This is the default
#SBATCH --ntasks 8
#SBATCH --cpus-per-task 1
module load fastqc
for fq in ./data/*.fastq.gz ; do
echo "$(date) Running $fq ..."
srun --ntasks 1 --cpus-per-task 1 fastqc "$fq" -o ./fastqc &
done
wait
#SBATCH <account,partition,qos>
#SBATCH --nodes 1 # This is the default
#SBATCH --ntasks 8
#SBATCH --cpus-per-task 4 # More CPUs
module load fastqc
for fq in ./data/*.fastq.gz ; do
echo "$(date) Running $fq ..."
srun --ntasks 1 --cpus-per-task 4 fastqc "$fq" -o ./fastqc &
# ^
done
wait
#SBATCH <account,partition,qos>
#SBATCH --nodes 6 # More nodes for the job
#SBATCH --ntasks 8
#SBATCH --cpus-per-task 4
module load fastqc
for fq in ./data/*.fastq.gz ; do
echo "$(date) Running $fq ..."
srun --ntasks 1 --cpus-per-task 4 fastqc "$fq" -o ./fastqc &
# This ^ is still 1 task
done
wait
Slurm will run the same batch script many times, automatically.
Each instance of that script has access to a special counter variable.
When you need to run basically the same command on many different files independently, these tasks can be run in parallel.
--array
argument to specify a sequence of integers, e.g., --array 0-3
.$SLURM_ARRAY_TASK_ID
in your script, which will take on the values in the array sequence.The following would be analogous:
sbatch
: $ sbatch --array 0-3
bash
: $ seq 0 3
R
: > seq(0,3)
Python
: >>> range(0,3)
#!/usr/bin/env bash
#SBATCH <account,partition,qos>
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --array 0-3
i=$SLURM_ARRAY_TASK_ID
world=( "Charles" "Rosalind" "George" "Rachel" )
echo "Hello ${world[$i]}! Your SLURM_ARRAY_TASK_ID is $i"
# This script will be run four times, because the array is expanded to `0,1,2,3`.
#SBATCH --array 0-3
# Each instance of the script, Slurm will set the special variable to those
# values (0,1,2,3). We'll save that as a new variable `i` because it's easier to
# type.
i=$SLURM_ARRAY_TASK_ID
# Here, we set the variable `world` to a list of names:
world=( "Charles" "Rosalind" "George" "Rachel" )
# Finally, use the special variable to get the `i`th item from the list `world`.
echo "Hello ${world[$i]}! Your SLURM_ARRAY_TASK_ID is $i"
Multiple values may be specified using a comma separated list and/or a range of values with a "-" separator. For example:
--array=0-15
--array=0,6,16-32
A step function can also be specified with a suffix containing a colon and number. For example:
--array=0-15:4
is equivalent to --array=0,4,8,12
.~ https://slurm.schedmd.com/sbatch.html#OPT_array
#!/usr/bin/env bash
#SBATCH <your sbatch directives>
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 1
#SBATCH --array 0-8
module load fastqc
i=$SLURM_ARRAY_TASK_ID
fastq_files=( data/*.fastq.gz )
echo "Found ${#fastq_files[@]} fastq files."
fastqc "${fastq_files[$i]}" -o ./fastqc