Basic concepts about Garnatxa
Before to describe how we can access to the file system of Garnatxa is important to know some basic concepts about HPC clusters.
Research problems that use computing can sometimes outgrow the desktop or laptop computer where they started.
There might not be enough memory, not enough disk space or it may take too long to run computations.
These days, a typical desktop or laptop computer has up to 16GB of memory, a small number of CPU cores and several TB of disk space. When you run your computations on a computer, the program and data get loaded from the disk to the memory and the program’s commands are executed on the CPU cores. Results of the computations are usually written back to the disk. If too much data needs to be loaded to memory, the computer will become unresponsive and may even crash.
Typical cases of when it may be beneficial to request access to an HPC cluster:
Computations need much more memory than what is available on your computer
The same program needs to be run multiple times (usually on different input data)
The program that you use takes too long to run, but it can be run faster in parallel (usually using MPI or OpenMP)
One should not expect that any program will automatically run faster on a cluster. In fact a serial program that does not need much memory may run slower on an HPC cluster than on a newer laptop or desktop computer. However, on a cluster one can take advantage of multiple cores by running several instances of a program at once or using a parallel version of the program.
An HPC cluster as Garnatxa is a collection of many separate servers (computers), called nodes, which are connected via a dedicated fast network.
The cluster has different types of nodes depending the function that they provide. From the point of view of users only a node
Each of the HPC clusters listed on this site has
A headnode or login node, where users log in
A set of compute nodes (where majority of computations is run).
A ser of storage nodes (each node export a set of disks to the global storage. A distributed file system like CEPH ‘joins’ all these disks in order to the users can access their files as a single disk)
A set of switches to connect all nodes (we use ).
All cluster nodes have the same components as a laptop or desktop: CPU cores, memory and disk space. The difference between a personal computer and a cluster node is in quantity, quality and power of the components.
The next diagram describes the main components of the cluster Garnatxa. Bellow is showed some of the steps that an user could carry out when connecting to Garnatxa.
A user needs to connect to Garnatxa in order to process some files. As the user is connecting from your home, it’s needed to establish a VPN connection before to connect to Garnatxa. A VPN connection encrypts all the data that a user sends/receives to/from the cluster. After this the user should execute: ssh garnatxa.uv.es and the system will ask for his username and password.
All users can access to three pools of data in order to store or retrieve data. The /home directory is a personal folder to store code, sources and general files to work in the cluster, /storage contains files, sources, applications or databases that should be shared by all members of the same project (there is a folder by each of the research group), /scr is a directory of temporal data used to copy and retrieve data from a job in the cluster. The data produced here should be deleted after a job is completed.
The cluster provides a lot of computational resources. These resources are shared by all users in the cluster. A resource manager (SLURM) is in charge to manage the resources between all the users (cpus, memory). Each internal node exports a set of disks and a distributed file system (CEPH) is able to abstract the access to the data. The cluster is showed to the user as a single hard disk and a set of cpus and memory to submit jobs.
Garnatxa implements 4 types of queues where to submit jobs. Each queue (partition) is limited to a set of resources (number of cpus, memory or time execution). A user must to write a template where some parameters of the job are specified. SLURM is the system charged to read this template and manage a set of suitable resources where to execute the job. Jobs in Garnatxa are queued and served according to order to receipt.
Jobs are assigned to internal nodes (depending the amount of cpus and memory that was requested by the user). When a job is launched to the queue system the user can check the status of the job. When the job is finished the user can access the the output data.
Figure 1. Global scheme showing the components of Garnatxa.
Access to the file system
Gatnatxa is equipped with several volumes of data that can be used to store files. These disk spaces have different properties and each of them is designed to best fit different usage intents.
/home : Upon login on a login node or front-end, you will be end up in your home directory. This directory is available on the front-end and all compute nodes of a cluster. The home filesystem is dedicated to your own software, configuration files, and small datasets.
/storage : Is used when you need to processing large files and the code, input or output files have to be shared by all members of your group. Each group has a directory enabled in /storage.
/scr : The /scr should be used to store the files generated by your batch jobs. You can also copy there large input files if required. After that, it is customary in the job script to create a subdirectory with the job id, where all temporary data will be written, and to clean that directory when the job finishes, and after having copied results to another location either on the same filesystem if the results are to be consumed by a later job, or on another filesystem such as the home or a remote long-term storage.
Note
All the content in /scr will be deleted after a reasonable amount of time. You should use /scr only as a transient storage space while your jobs are running.
Warning
There is no backup of the data stored on the cluster. Any accidentally removed file is lost for ever. It is the user’s responsibility to keep a copy of the contents of their personal storage device in a safe place.
Storage Quotas
All file systems need a certain amount of free space to perform self-maintenance tasks. The minimum percentage of “free” space depends on the type of file system and ranges from 5% for local file systems (on the computer itself) to 20% for globally distributed file systems (such as CEPH). The file system must also have sufficient free space to handle unexpected errors or failures. In the case of garnatxa, this means that when the file system occupancy reaches approximately 75%, a saturation alarm (soft limit) is generated, and when it reaches 80% (hard limit), drastic measures must be taken, including putting the file system in “read-only” mode. In this situation, users cannot write any more data, and work in progress is canceled due to write errors.
Due to the need to keep storage resources below the saturation point (hard limit), it is necessary to implement a system that regulates the use that research groups make of primary storage. This system must ensure that the overall capacity used in the system does not exceed the saturation limit, but at the same time:
- Basic operation:
It must maximize the total capacity available to all groups, allowing data to be written without restriction as long as the overall capacity used in the system does not exceed the hard security limit.
In the event that the overall capacity used on disk exceeds the soft limit, the system will notify the group(s) with the highest storage usage that they must reduce their usage to a dynamically calculated limit (quota).
If the total disk capacity used exceeds the hard limit, the system will automatically apply a write restriction to the group or groups in order of highest storage usage, in addition to other measures, such as restricting the execution of jobs on the system.
This restriction will prevent group members from writing to the group folder and will remain in place until they delete an amount equal to or greater than that estimated by the system to preserve the safety margin.
The rest of the groups will be able to continue writing/reading from the system as normal.
The system will take into account the storage resources that the groups have contributed to the overall system. This will allow a percentage to be applied to the resources contributed by each group and will guarantee the use of a certain amount of storage without restrictions. The amount of storage used outside the amount contributed will be counted for the application of restrictions.
- Operational aspects:
Some groups, due to the size of their projects, need to make voluntary contributions (by purchasing hard drives) in order to scale the system’s storage capacity. These additional amounts are taken into account by the quota system when calculating the storage capacity used by each group. Thus, the space stored by a group at a given time is calculated as follows:
Effective size = Used size - Contributed size
Effective size: This is the capacity used by a group that will actually be taken into account when ranking the overall disk usage of each group.
Used size: The system monitors the disk capacity actually used by each group on a daily basis in the path: /storage/<group_name>
Contributed size: The system stores the disk capacity voluntarily contributed by each group in a database. It should be noted that the raw capacity of each disk is divided by the system replication factor: 3. Afterwards, only 80% of the remaining amount will be available, as the other 20% is managed by the system to comply with safety margins. In the case of contributions by groups external to I2SysBio, the usable disk percentage will be 70%.
Examples for a group with 100TB disk contribution and used capacity of 50TB. Net capacity after removing replication 100/3 = 33.33TB
I2SysBio Group:
Used size = 50 TB
Contributed size = 33.33 * 0.8 = 26.66 TB
Effective size = 50 - 26.66 = 23.34 TB
External group:
size used = 50 TB
contributed size = 33.33 * 0.7 = 23.33 TB
effective size = 50 - 26.66 = 26.67 TB
Modeling quotas system using formulas:
- Definitions:
Q is the common/general storage (TB) (provided by I2SysBio as an institution through FEDER, FAS and other funds).
In garnatxa there are n groups, G, which use a certain amount of storage \(g_{i}\) at a given time.
The groups may eventually contribute a certain amount of storage, \(a_{i}\). Of that amount, a fraction γ remains available to the group. The term γ is the same for all I2SysBio groups but different for external groups.
The total amount of available storage, M, will then be given by:
(1)\[M=Q+\sum \limits_{∀i ∈ G} a_{i}\]It is understood that the file system is at risk when the soft limit is reached, i.e., when usage reaches L = 0.75 · M:
(2)\[\sum \limits_{∀i ∈ G} g_{i} > L\]It is understood that the storage system reaches the critical saturation point (hard limit) when usage reaches H = 0.80 · M:
(3)\[\sum \limits_{∀i ∈ G} g_{i} > H\]Each group is making use of an effective occupation given by (where the weighted contribution made by each group is taken into account.):
(4)\[ge_{i}=g_{i}-γa_{i}, ∀i ∈ G,\]- Algorithm:
When the system reaches the saturation alarm, quotas must be applied to the group or groups that are consuming the most effective resources. The excess occupancy of the file system, E, is:
\[E=H-\sum \limits_{∀i ∈ G} g_{i}\]and therefore each group must “reduce” the effective disk usage by a certain amount \(k_{i}\) so that the following is true:
(5)\[\sum \limits_{∀i,k_{i}>0} (ge_{i}-k_{i})=H,\]subject to restriction:
(6)\[\sum \limits_{∀i,k_{i}>0} k_{i}=E,\]On the other hand, all groups will now have a common quota, Q, calculated dynamically that complies with
(7)\[∀i,k_{i}>0|(ge_{i}-k_{i})=Q\]from which it can be derived that
(8)\[∀i,k_{i}>0|k_{i} = ge_{i} − Q\]Using expression (8) in (6), we have
\[ \begin{align}\begin{aligned}\sum \limits_{∀i,k_{i}>0} (ge_{i}-Q)=E\\\sum \limits_{∀i,k_{i}>0} ge_{i} - \sum \limits_{∀i,k_{i}>0} Q=E\\\sum \limits_{∀i,k_{i}>0} ge_{i} - nQ=E\end{aligned}\end{align} \]expression from which Q can be calculated:
(9)\[Q = \frac{\sum \limits_{∀i,k_{i}>0} ge_{i} - E}{n}\]All groups {K} whose actual occupancy is above Q must therefore reduce their storage occupancy by the amount
(10)\[r_{k} = ge_{k} − Q = (g_{k}+γa_{k}) -Q\]The administrator will then notify the group leaders who find themselves in this situation to reduce their disk usage by the value r_{k}. If the file system usage reaches the saturation point (hard limit), the administrator will impose the quota Q + \(γa_{k}\) on each group k.
Use cases:
We have a total storage capacity in the system of 1000TB and we set the safety limit at 80%, i.e. L = 800TB of capacity. For each group, there is a history of the disk contributions they have made to the cluster. For example, suppose that group G1 made a contribution of 600 gross TB to the cluster storage system. In practice, due to the ×3 replication factor used by the system, the net storage contributed is 600/3 = 200TB. Since only 80% of the net storage is usable, the final contribution of group G1 is 0.80 · 200 = 160TB. The following is an example of the dynamic quota mechanism being applied, taking into account the contributions made by the groups. Suppose that at a given moment the following situation arises:
Case 1: Quota set at the maximum 800TB and the contribution of 160TB made by group G1 is taken into account.
Group |
Disk usage |
Contribution |
Eff. usage |
|---|---|---|---|
G1 |
400TB |
160TB |
400-160=240TB |
G2 |
200TB |
0 |
200TB |
G3 |
100TB |
0 |
100TB |
G4 |
50TB |
0 |
50TB |
Total |
700TB |
Since the storage used by all groups is 750 < 800TB, which sets the safety limit, the system allows unrestricted data storage for all groups.
Case 2: G2, G3, and G4 increase the amount of disk storage used.
Group |
Disk usage |
Contribution |
Eff. usage |
|---|---|---|---|
G2 |
250TB |
0 |
250TB |
G1 |
400TB |
160TB |
240TB |
G3 |
150TB |
0 |
150TB |
G4 |
150TB |
0 |
150TB |
Total |
950TB |
Since the total capacity used by all groups is 950 > 800TB, the disk capacity used must be reduced by: E = 950–800 = 150TB
To calculate the new installment, we use expression (9), which gives us the result:
To refine the calculation, we will calculate an adjusted quota, Q’, taking into account only those groups whose \(ge_{i}\) >= Q (in our example, G1 and G2). The value of Q’ will be given by:
Accordingly, the dynamically calculated quotas will be as follows:
Group |
Disk usage |
Contribution |
Eff. usage |
Quota |
Size to delete |
|---|---|---|---|---|---|
G2 |
250TB |
0 |
250TB |
170TB |
250-170=80TB |
G1 |
400TB |
160TB |
240TB |
170+160=330TB |
400-230=70TB |
G3 |
150TB |
0 |
150TB |
||
G4 |
150TB |
0 |
150TB |
||
Total |
950TB |
Case 3: Let’s assume that storage usage is as follows:
Group |
Disk usage |
Contribution |
Eff. usage |
|---|---|---|---|
G2 |
250TB |
0 |
250TB |
G1 |
400TB |
160TB |
240TB |
G3 |
210TB |
0 |
210TB |
G4 |
80TB |
0 |
80TB |
Total |
940TB |
Since the total capacity used by all groups is 940 > 800TB, the capacity used on disk must be reduced by: E = 940–800 = 140TB. To calculate the new quota, we use expression (9), which gives us the result:
As in the previous case, this value of Q is very restrictive and leads to negative \(k_{i}\) values. To refine the calculation, we will calculate an adjusted quota, Q’, taking into account only those groups whose \(ge_{i}\) >= Q (in our example, G1, G2, and G3). The value of Q’ will be given by:
Accordingly, the dynamically calculated quotas will be as follows:
Group |
Disk usage |
Contribution |
Eff. usage |
Quota |
Size to delete |
|---|---|---|---|---|---|
G2 |
250TB |
0 |
250TB |
187TB |
250-170=63TB |
G1 |
400TB |
160TB |
240TB |
187+160=347TB |
400-230=53TB |
G3 |
210TB |
0 |
210TB |
187TB |
210-187=23TB |
G4 |
80TB |
0 |
80TB |
||
Total |
940TB |
Transfer files to/from Garnatxa
This section is about transferring data between your computer and Garnatxa (and reverse). Those examples are only for Linux and MacOs computers. For Windows systems we recommend to use:
The simplest way to copy a file from your computer to Garnatxa is to use the scp command. Due to compatibility reasons is recommended to add the parameter: -O (capital letter) to scp.
$ [LOCALUSER@my_computer ~]$ scp -O ./file.txt USERNAME@garnatxa.uv.es:./my_garnatxa_dir/
The above command copies file.txt from your computer to this path: /home/USERNAME/my_garnatxa_dir in garnatxa. Replace USERNAME by your user name in Garnatxa.
Copying it back is done with
$ [LOCALUSER@master ~]$ scp -O USERNAME@garnatxa.uv.es:./my_garnatxa_dir/file.txt .
This copy file.txt from garnatxa to your host.
If you want to copy a directory and its content, use the -r option, just like with scp.
$ [LOCALUSER@master ~]$ scp -O -r garnatxa.uv.es:/my_garnatxa_dir .
There is an alternative to synchronize a directory or individual files with Garnatxa. Use the rsync command if you want to copy only the new files and modifications from your source directory to the destination folder in Garnatxa.
$ [LOCALNAME@master ~]$ rsync --inplace --progress --partial --append -av ./my_local_directory USERNAME@garnatxa.uv.es:.
This copy only new and modified files from your local directory to the remote directory in garnatxa. With rsync you can restart an interrupted transference from the last copied file.
Split your files
If you have a very large file and you need to split the file in minor portions (in order to parallelize the execution of data) then use:
split --numeric-suffixes -l number_of_lines_to_split input_file output_prefix
Example: we have a fastq file with 14010000 lines and we want to split the file in 1000000 lines per file resulting 15 files:
$ [USERNAME@master data_red]$ split --numeric-suffixes -l 1000000 reads_00.fq subreads00_
$ [USERNAME@master data_red]$ ls subreads00_*
subreads00_00 subreads00_01 subreads00_02 subreads00_03 subreads00_04 subreads00_05 subreads00_06 subreads00_07 subreads00_08 subreads00_09 subreads00_10 subreads00_11 subreads00_12 subreads00_13 subreads00_14
Now you can do concurrent executions per file.
Tape library
Garnatxa provides tape storage service through a robotic tape library. The device is capable of keeping up online to 40 LTO 9 tapes of 16TB capacity. The tape life of an LTO tape is up to 30 years.
The main function of this device is to move storage data that must remain long-term from the shared storage in garnatxa to the tape device. Please note that if you exceed a certain disk quota in Garnatxa, administrators may inform you to delete disk space or move data to the tape device.
We also provide tape storage service for the storage of large databases, genomic datasets, etc.
How to request access to the tape system?
To use the tape device your group should:
Request the purchase of one or more tape. Each tape has an estimated price of about €100 which must be assumed by the group or external collaboration. Use the ticket platform (Garnatxa HPC item) in order to initialize the tape purchase: https://garnatxadoc.uv.es/support
Once the tape is available (we will inform your), the administrator will prepare the system so that only the group that owns the tape can access it. The tape is showed in the system as just another directory, i.e.: /tape2/GRP0000L9/
The tape will be available for access for a certain time. After a group finishes storing/retrieving data on the tape it will be removed from the device to be guarded by the ownership group or in the garnatxa administrators.
What operations can be performed on the tape?
You must keep in mind that tape devices are sequential access storage media. This means that:
Use rsync command to copy or read data from tape. Delete the garnatxa data once you verify that the transfer has been carried out correctly.
Although you see the tape mounted on the system as just another data directory, the information you transfer will be stored continuously until the capacity is reached. If you delete files on the tape the size of the tape will not decrease although the deleted files will no longer appear on the tape. The only way to get full capacity back is to format the tape.
Although it is possible to modify files directly on the tape directory, it is not advisable since this action introduces inefficiency in subsequent reading operations.
Copying files to tape
Once you receive permission to use the tape system, you can use the tapecopy command in order to transfer files and directories to the tape.
You can obtain more help about the sintaxis: tapecopy -h
tapecopy: It copies the files to the first available tape with enough free space. If there isn’t a tape with enough free space, the process won’t perform any file copies.
To use the tapecopy command you must be logged in the server: merlot.
Before copying a directory, check its size with the command: checkdiskspace. Tapes have a usable capacity of 16TB. Keep in mind that tapecopy will search for a free tape among the available tapes where all the directory data can be stored.
If you have more than one tape installed in /tape2 and the total size of your files to be copied is larger than that of a single tape (16TB), the tapecopy command will automatically copy all your files to all available tapes. If the file/s to be copied already exist in the tape the copy of this file/s will not be performed. If a tape is exhausted in size the process continues storing data in the next tape.
To check the available space for each tape use: tapecopy -l
You can list the contents of a tape using the usual Linux command ls in path: /tape2
ls /tape2/<tape_code>
Examples:
$ [USERNAME@master ~]$ ssh merlot
Each time the command tapecopy is executed, a tape copy job is sent to the queue system. The progress of the copy task can be reviewed through a text file generated when the command is executed.
Thus, it is possible to exit from Garnatxa and leave the copy process running on the cluster. Below you can see some examples of use:
Example: copy a single file to the tape.
$ [USERNAME@master ~]$ tapecopy test/a.out
Submitted batch job 1481875
A new job has been submitted to the queue system (transferring 2 GB. to tape). You can check the progress of the copy checking the file: jobtape_20250507144350.out
Example: copy a complete directory to the tape.
$ [USERNAME@master ~]$ tapecopy test/output_dir
Submitted batch job 1481876
A new job has been submitted to the queue system (transferring 28 GB. to tape). You can check the progress of the copy checking the file: jobtape_20250507144360.out
Example: Using wildcards in the source path.
$ [USERNAME@master ~]$ tapecopy "test/out/*.sai"
Submitted batch job 1481877
A new job has been submitted to the queue system (transferring 28 GB. to tape). You can check the progress of the copy checking the file: jobtape_20250507144370.out
Warning
When you enter wildcards in the path you need to use quotes to enclose them.
Example: Show available space on tapes.
$ [USERNAME@master ~]$ tapecopy -l
------------------------------------------------------------
Available tapes:
Barcode Capacity Avail Use%
------- -------- -------- ----
XXX006L9 16344GiB 6411GiB 60%
XXX007L9 16344GiB 16344GiB 0%
XXX008L9 16344GiB 16344GiB 0%
XXX009L9 16344GiB 16344GiB 0%
XXX010L9 16344GiB 16344GiB 0%
XXX011L9 16344GiB 16344GiB 0%
XXX012L9 16344GiB 16344GiB 0%
XXX013L9 16344GiB 16344GiB 0%
XXX014L9 16344GiB 16344GiB 0%
XXX015L9 16344GiB 16344GiB 0%
XXX016L9 16344GiB 16344GiB 0%
XXX017L9 16344GiB 16344GiB 0%
XXX018L9 16344GiB 16344GiB 0%
------------------------------------------------------------
Finally you can check the content of tapes using the usual Linux commands:
$ [USERNAME@master ~]$ ls -l /tape2/XXX006L9/
drwxrwx--- 3 user grp 0 Oct 2 10:56 /home/user/test/out/file0.sai
drwxrwx--- 3 user grp 0 Oct 2 10:56 /home/user/test/out/file1.sai
drwxrwx--- 3 user grp 0 Oct 2 10:56 /home/user/test/out/file2.sai
Or restore files from the tapes to your account in Garnatxa. Use the copy command: cp to do it:
$ [USERNAME@master ~]$ cp /tape2/XXX006L9/home/user/test/out/file0.sai /home/user/test/out