Pipelines: Snakemake
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment. Finally, workflow runs can be automatically turned into interactive portable browser based reports, which can be shared with collaborators via email or the cloud and combine results with all used parameters, code, and software.
This section explains the very basic concepts for using snakemake in Garnatxa. For a more complete understanding, it is recommended to visit the official official documentation .
To explain how to use shakemake in garnatxa we will use the same sample examples used in previous sections: BWA indexing and aligning.
Installation
To use Snakemake, an initial installation is required for each user. You can create a new environment in conda where you can install the software. Afterward, you will need to load this environment every time you want to use snakemake.
Load, create and activate a new conda environment to install snakemake:
$ module load anaconda
$ mamba create -n snakemake
$ conda activate snakemake
Now you can install the snakemake software and dependencies:
$ mamba install -c bioconda snakemake
Remember to activate the created environment every time you need to use snakemake in a user session. Also you can add the line: module load snakemake in your .bashrc in order to activate the environment from the init.
Basic concepts about Snakemake
Snakemake follows the GNU Make paradigm: workflows are defined in terms of rules that define how to create output files from input files. Dependencies between the rules are determined automatically, creating a DAG (directed acyclic graph) of jobs that can be automatically parallelized. Snakemake sets itself apart from other text-based workflow systems in the following way. Hooking into the Python interpreter, Snakemake offers a definition language that is an extension of Python with syntax to define rules and workflow specific properties. This allows Snakemake to combine the flexibility of a plain scripting language with a pythonic workflow definition.
We will use three files to define three entities:
Rules: These are each of the pipeline steps where the commands to be executed and the expected inputs and outputs for each step are defined. There is one main rule: all, which defines the overall output of the pipeline that is expected. Based on this rule, snakemake activates the rest of the rules depending on the available inputs and outputs.
Configuration: For each stage of the pipeline (rule), we can define the resources in terms of CPU, memory, and time that it will require. These parameters will be passed to Slurm so that it can start a job for each step.
SLURM Launcher: An initial job will be defined to start the entire snakemake pipeline. This job acts as the pipeline master and will remain active in the queuing system until all pipeline stages are completed.
Configuration file: Snakeconfig.yaml
1executor: slurm
2jobs: 200
3
4default-resources:
5 slurm_partition: "global"
6
7set-threads:
8 bwa_index: 1
9 bwa_align: 8
10 bwa_samse: 1
11
12set-resources:
13 bwa_index:
14 slurm_partition: "global"
15 mem_mb: 5000
16 slurm_extra: "' -q short '"
17 runtime: "10h"
18 bwa_align:
19 slurm_partition: "global"
20 mem_mb: 2000
21 slurm_extra: "' -q medium '"
22 runtime: "5m"
23 bwa_samse:
24 slurm_partition: "global"
25 mem_mb: 2000
26 slurm_extra: "' -q short '"
27 runtime: "10m"
jobs: 200 -> Specifies the maximum number of jobs that can be submitted to the queuing system concurrently. If the user exceeds the limits imposed by Slurm, the jobs will remain queued until resources become available.
set-threads -> The cores required by each rule in the pipeline are specified.
set-resources: -> Define the remaining resources required by each rule: memory, queue type, and maximum execution time.
Pipeline implementation: Snakefile
This file must exist within the folder where you run snakemake. You can change its name (snakemake -S <snakemake file path>), but then you must specify it in the snakemake command line (within the SLURM launcher file). Its contents reflect the rules or steps within the pipeline that must be followed to obtain the expected result (defined by the general rule “all”). Snakemake will execute each rule in the appropriate order depending on whether or not it has the inputs required for each stage of the pipeline.
1genome = "ref/chr8.fa"
2SAMPLES, = glob_wildcards("data/{sample}.fq")
3
4rule all:
5 input:
6 expand("out/{sample}.sam", sample=SAMPLES)
7
8rule bwa_index:
9 input:
10 genome
11 output:
12 idx = multiext("ref/chr8_ref", ".amb", ".ann", ".bwt", ".pac", ".sa")
13 log:
14 "logs/bwa_index.log"
15 shell:
16 """
17 module load bwa
18 bwa index {input} -p ref/chr8_ref 2> {log}
19 """
20
21rule bwa_align:
22 input:
23 read_sample = "data/{sample}.fq",
24 # When requesting a file from the index, Snakemake fires the bwa_index ruleAl pedir un archivo del index, Snakemake dispara la regla bwa_index
25 idx = "ref/chr8_ref.bwt"
26 log:
27 "logs/bwa_align_{sample}.log"
28 output:
29 "out/{sample}.sai"
30 shell:
31 """
32 module load bwa
33 bwa aln -I -t {threads} ref/chr8_ref {input.read_sample} > {output} 2> {log}
34 """
35
36rule bwa_samse:
37 input:
38 "out/{sample}.sai",
39 "data/{sample}.fq"
40 log:
41 "logs/bwa_samse_{sample}.log"
42 output:
43 "out/{sample}.sam"
44 shell:
45 """
46 module load bwa
47 bwa samse ref/chr8_ref {input} {genome} > {output} 2> {log}
48 """
Line 1 : Defines a path to the reference genome file. This variable will be referenced later by the rules.
Line 2 : Defines a list variable that contains the data for reading files. Wildcards can be used to select files in a directory by extension or name.
Line 4-6 : Defines the general rule for the entire pipeline. This rule indicates which file or files should be generated at the end of the pipeline. In this example, multiple files with the .sam extension in the directory out` is expected to be produced for each input file read. Note that based on this rule, Snakemake will search for ways to generate these files by examining the other rules and their input and output definitions. The variable sample represents the name of the file to be obtained without the extension; it is obtained from the input files instantiated in the variable SAMPLE.
Line 8-19 : Define the indexing rule. In the input section, indicate that indexing requires the genome reference file. Note that the genome variable was defined at the beginning. In the output variable, define the files you want to obtain after indexing. Keep in mind that if you don’t specify each file, subsequent steps (rules) will not be able to continue executing. In the log section, specify the path and filename where the standard output of the indexing execution will be displayed. Finally, in the shell section, specify the lines that are executed by this rule. In this case, you need to load the bwa tool and run it: bwa index. The input parameter is passed all the variables defined in the input section. In this case, the path is the reference file.
Line 21-24 : This corresponds to the alignment rule. In the input section, the list of paths to the read files (which must have the .fq extension) is defined. It also specifies that at least one indexing file with the .bwt extension must exist. Note that this forces snakemake to wait for the previous indexing phase (which generates the .bwt file) before starting the alignment. In general, snakemake will not start a rule until it has all the defined input files. For output, we specify that files with the .sai extension must be generated. As snakemake generates these files, the subsequent sam generation rule will be executed for each of the .sai files already produced. The parameter {input.read_sample} instantiates a single read file defined in the input section. Note that the align rule acts on each of the read files in the data directory; that is, an alignment rule is executed concurrently (in parallel) for each one. Finally, the output file name, which must have the .sai extension, is indicated by means of the variable {output}.
Line 36-48: The samse rule is defined similarly. In this case, the input section defines that each instance of this rule will require a file with the .sai extension (produced during the alignment stage) and a read file (extension .fq). The output section specifies that a file with the .sam extension will be produced. Remember that at runtime, snakemake instantiates the variable {sample} with the name of the read file it is currently processing.
Launcher script: SnakemakeLauncher.sbatch
To run the pipeline, a job must be launched on the cluster to deploy Snakemake to the internal nodes. This master job must run continuously throughout the pipeline execution, so it’s important to allocate sufficient execution time to ensure the pipeline can complete.
1#!/bin/bash
2#SBATCH --job-name=snakemakeLauncher # Job name (showed with squeue)
3#SBATCH --output=snakemake_%j.out # Standard output and error log
4#SBATCH --ntasks=1 # Required only 1 task
5#SBATCH --cpus-per-task=1 # Required only 1 cpu
6#SBATCH --mem=1G # Required 1GB of memory
7#SBATCH --time=2-00:00:00 # Time limit days-hrs:min:sec -> requested: 2 day.
8#SBATCH --qos=medium # QoS: short,medium,long,long-mem
9
10# Load module anaconda
11module load anaconda
12
13# Activate Snakemake environment
14mamba activate snakemake
15
16# Submit a master process of snakemake.
17snakemake --slurm-jobname-prefix snakemake --profile ./Snakeconfig.yaml
18
19exit 0
Line 11 and 14 : To make the snakemake command accessible, the conda module must be loaded beforehand, and then the snakemake environment that allows the command to be executed must be activated.
Line 17 : Launch the pipeline with snakemake. You only need to specify the master job name that it will receive in the SLURM queues and the configuration file we saw earlier. By default, snakemake expects a file called “Snakefile” in the same directory where it is launched.
Launching the above script we can monitoring the execution of our pipeline:
$ sbatch SnakemakeLauncher.sbatch
Then the master job will be submitted to the queue system. You can check that index, aligned and sam processes will be executed in order. Firstly only an indexing process will be executed and after the rest of align processes concurrently.
squeue -u user1
JOBID NAME PARTITION QOS USER ACCOUNT START_TIME TIME TIME_LEFT NODES CPU MIN_M NODELIST ST REASON
2693425 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:20 4:40 1 8 2000M CG None
2693426 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:18 4:42 1 8 2000M CG None
2693428 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:13 4:47 1 8 2000M CG None
2693423 snakemake global medium user1 cpd 2026-04-01T10:07 3:05 1-23:56:55 1 2 1G cn10 R None
2693427 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:18 4:42 1 8 2000M cn12 R None
2693429 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:14 4:46 1 8 2000M cn12 R None
2693430 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:14 4:46 1 8 2000M cn12 R None
2693431 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:09 4:51 1 8 2000M cn12 R None
2693432 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:09 4:51 1 8 2000M cn11 R None
2693433 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:04 4:56 1 8 2000M cn10 R None
2693434 snakemake_47464 global medium user1 cpd 2026-04-01T10:10 0:04 4:56 1 8 2000M cn12 R None
In the above output the indexing job: 840576 is in state CG (completing) when the rest of aligned processes are concurrently executed. Note that the nextflow master job 840573 remains in the queue until all jobs in the pipeline are finished. Use the jobid of the master job to monitor de output of nextflow during the execution. In this example the assigned job id: 2693423
1$ cat snakemake-2693423.out
2Run 'mamba init' to be able to run mamba activate/deactivate
3and start a new shell session. Or use conda to activate/deactivate.
4
5Using profile ./Snakeconfig.yaml for setting default command line arguments.
6host: osd11
7Building DAG of jobs...
8You are running snakemake in a SLURM job context. This is not recommended, as it may lead to unexpected behavior. If possible, please run Snakemake directly on the login node.
9SLURM run ID: snakemake_f1213144-87b2-43fb-961f-2f14e463d0e1
10MinJobAge 300s (>= 120s). 'squeue' should work reliably for status queries.
11Using shell: /usr/bin/bash
12Provided remote nodes: 200
13Job stats:
14job count
15--------- -------
16bwa_index 1
17bwa_align 20
18bwa_samse 20
19all 1
20total 42
21
22Select jobs to execute...
23Execute 1 jobs...
24No SLURM account given, trying to guess.
25No account was given, not able to get a SLURM account via sacct: sacct: invalid option -- '1'
26
27Unable to guess SLURM account. Trying to proceed without.
28Job 3 has been submitted with SLURM jobid 2693548 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_index/2693548.log).
29[Wed Apr 1 12:13:16 2026]
30Finished jobid: 3 (Rule: bwa_index)
311 of 42 steps (2%) done
32Select jobs to execute...
33Execute 20 jobs...
34Job 7 has been submitted with SLURM jobid 2693550 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_02/2693550.log).
35Job 37 has been submitted with SLURM jobid 2693551 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_01/2693551.log).
36Job 23 has been submitted with SLURM jobid 2693552 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_04/2693552.log).
37Job 39 has been submitted with SLURM jobid 2693553 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_05/2693553.log).
38Job 9 has been submitted with SLURM jobid 2693554 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_17/2693554.log).
39Job 25 has been submitted with SLURM jobid 2693555 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_14/2693555.log).
40Job 41 has been submitted with SLURM jobid 2693556 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_19/2693556.log).
41Job 11 has been submitted with SLURM jobid 2693557 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_12/2693557.log).
42Finished jobid: 11 (Rule: bwa_align)
439 of 42 steps (21%) done
44Select jobs to execute...
45Execute 2 jobs...
46Job 19 has been submitted with SLURM jobid 2693566 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_07/2693566.log).
47Job 35 has been submitted with SLURM jobid 2693567 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_18/2693567.log).
48Job 5 has been submitted with SLURM jobid 2693568 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_06/2693568.log).
49Job 21 has been submitted with SLURM jobid 2693569 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_align/reads_08/2693569.log).
50Job 6 has been submitted with SLURM jobid 2693570 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_samse/reads_02/2693570.log).
51Job 36 has been submitted with SLURM jobid 2693571 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_samse/reads_01/2693571.log).
52Job 22 has been submitted with SLURM jobid 2693572 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_samse/reads_04/2693572.log).
53Job 38 has been submitted with SLURM jobid 2693573 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_samse/reads_05/2693573.log).
54Job 8 has been submitted with SLURM jobid 2693574 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_samse/reads_17/2693574.log).
55Job 24 has been submitted with SLURM jobid 2693575 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_samse/reads_14/2693575.log).
56Job 10 has been submitted with SLURM jobid 2693576 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_samse/reads_12/2693576.log).
57Job 40 has been submitted with SLURM jobid 2693577 (log: /storage/cpd/home_members/user1/test_prueba/.snakemake/slurm_logs/rule_bwa_samse/reads_19/2693577.log).
58[Wed Apr 1 12:15:20 2026]
59Finished jobid: 4 (Rule: bwa_samse)
6041 of 42 steps (98%) done
61Select jobs to execute...
62Execute 1 jobs...
63[Wed Apr 1 12:15:21 2026]
64localrule all:
65input: out/reads_16.sam, out/reads_06.sam, out/reads_02.sam, out/reads_17.sam, out/reads_12.sam, out/reads_11.sam, out/reads_15.sam, out/reads_09.sam, out/reads_07.sam, out/reads_08.sam, out/reads_04.sam, out/reads_14.sam, out/reads_10.sam, out/reads_13.sam, out/reads_00.sam, out/reads_03.sam, out/reads_18.sam, out/reads_01.sam, out/reads_05.sam, out/reads_19.sam
66jobid: 0
67reason: Input files updated by another job: out/reads_19.sam, out/reads_02.sam, out/reads_11.sam, out/reads_06.sam, out/reads_09.sam, out/reads_01.sam, out/reads_16.sam, out/reads_05.sam, out/reads_18.sam, out/reads_00.sam, out/reads_08.sam, out/reads_04.sam, out/reads_07.sam, out/reads_13.sam, out/reads_17.sam, out/reads_03.sam, out/reads_12.sam, out/reads_15.sam, out/reads_14.sam, out/reads_10.sam
68resources: tmpdir=/scr/user1/tmp, disk_mb=34685, disk=34.69 GB, disk_mib=33079, mem_mb=8000, mem=8 GB, mem_mib=7630, slurm_partition=global
69[Wed Apr 1 12:15:21 2026]
70Finished jobid: 0 (Rule: all)
7142 of 42 steps (100%) done
72Cleaning up SLURM log files older than 10 day(s).
73Complete log(s): /storage/cpd/home_members/user1/test_prueba/.snakemake/log/2026-04-01T121045.858825.snakemake.log
Graphs
Snakemake allows you to generate a dependency graph between rules based on the configuration file you define.
To do this, add the `--dag` (complete DAG including instantiated rules) or --rulegraph (only rules dependencies) parameter and redirect the output to a plot generator like dot.
snakemake --profile ./Snakeconfig.yaml --dag | dot -Tpng > snakemakedag.png
The generated plot can be seen in the following figure
Figure 1. Example of workflow diagram generated by Nextflow.