Basic concepts about Garnatxa

Before to describe how we can access to the file system of Garnatxa is important to know some basic concepts about HPC clusters.

Research problems that use computing can sometimes outgrow the desktop or laptop computer where they started.

There might not be enough memory, not enough disk space or it may take too long to run computations.

These days, a typical desktop or laptop computer has up to 16GB of memory, a small number of CPU cores and several TB of disk space. When you run your computations on a computer, the program and data get loaded from the disk to the memory and the program’s commands are executed on the CPU cores. Results of the computations are usually written back to the disk. If too much data needs to be loaded to memory, the computer will become unresponsive and may even crash.

Typical cases of when it may be beneficial to request access to an HPC cluster:

  • Computations need much more memory than what is available on your computer

  • The same program needs to be run multiple times (usually on different input data)

  • The program that you use takes too long to run, but it can be run faster in parallel (usually using MPI or OpenMP)

One should not expect that any program will automatically run faster on a cluster. In fact a serial program that does not need much memory may run slower on an HPC cluster than on a newer laptop or desktop computer. However, on a cluster one can take advantage of multiple cores by running several instances of a program at once or using a parallel version of the program.

An HPC cluster as Garnatxa is a collection of many separate servers (computers), called nodes, which are connected via a dedicated fast network.

The cluster has different types of nodes depending the function that they provide. From the point of view of users only a node

Each of the HPC clusters listed on this site has

  • A headnode or login node, where users log in

  • A set of compute nodes (where majority of computations is run).

  • A ser of storage nodes (each node export a set of disks to the global storage. A distributed file system like CEPH ‘joins’ all these disks in order to the users can access their files as a single disk)

  • A set of switches to connect all nodes (we use ).

All cluster nodes have the same components as a laptop or desktop: CPU cores, memory and disk space. The difference between a personal computer and a cluster node is in quantity, quality and power of the components.

The next diagram describes the main components of the cluster Garnatxa. Bellow is showed some of the steps that an user could carry out when connecting to Garnatxa.

  1. A user needs to connect to Garnatxa in order to process some files. As the user is connecting from your home, it’s needed to establish a VPN connection before to connect to Garnatxa. A VPN connection encrypts all the data that a user sends/receives to/from the cluster. After this the user should execute: ssh garnatxa.uv.es and the system will ask for his username and password.

  2. All users can access to three pools of data in order to store or retrieve data. The /home directory is a personal folder to store code, sources and general files to work in the cluster, /storage contains files, sources, applications or databases that should be shared by all members of the same project (there is a folder by each of the research group), /scr is a directory of temporal data used to copy and retrieve data from a job in the cluster. The data produced here should be deleted after a job is completed.

  3. The cluster provides a total of 864 cpus, 14TB of RAM and 2.6 PB of space in disk. These resources are shared by all users in the cluster. A resource manager (SLURM) is in charge to manage the resources between all the users (cpus, memory). Each internal node exports a set of disks and a distributed file system (CEPH) is able to abstract the access to the data. The cluster is showed to the user as a single hard disk and a set of cpus and memory to submit jobs.

  4. Garnatxa implements 4 types of queues where to submit jobs. Each queue (partition) is limited to a set of resources (number of cpus, memory or time execution). A user must to write a template where some parameters of the job are specified. SLURM is the system charged to read this template and manage a set of suitable resources where to execute the job. Jobs in Garnatxa are queued and served according to order to receipt.

  5. Jobs are assigned to internal nodes (depending the amount of cpus and memory that was requested by the user). When a job is launched to the queue system the user can check the status of the job. When the job is finished the user can access the the output data.

../_images/global_vision.jpg

Figure 1. Global scheme showing the components of Garnatxa.

Access to the file system

Gatnatxa is equipped with several file systems that can be used to store files. These disk spaces have different properties and each of them is designed to best fit different usage intents.

  • /home : Upon login on a login node or front-end, you will be end up in your home directory. This directory is available on the front-end and all compute nodes of a cluster. The home filesystem is dedicated to source code (programs, scripts), configuration files, and small datasets (like input or output files.). Remember to copy your needed files from /home to /scr when submit a job.

  • /storage : Is used to when you need to processing large files and the code, input or output files have to be shared by all members of your group.

  • /scr : The /scr should be used to store the files generated by your batch jobs. You can also copy there large input files if required. After that, it is customary in the job script to create a subdirectory with the job id, where all temporary data will be written, and to clean that directory when the job finishes, and after having copied results to another location either on the same filesystem if the results are to be consumed by a later job, or on another filesystem such as the home or a remote long-term storage.

Note

All the content in /scr will be deleted after a reasonable amount of time. You should use /scr only as a transient storage space while your jobs are running.

Warning

There is no backup of the data stored on the cluster. Any accidentally removed file is lost for ever. It is the user’s responsibility to keep a copy of the contents of their personal storage device in a safe place.

Transfer files to/from Garnatxa

This section is about transferring data between your computer and Garnatxa (and reverse). Those examples are only for Linux and MacOs computers. For Windows systems we recommend to use:

The simplest way to copy a file from your computer to Garnatxa is to use the scp command. Due to compatibility reasons is recommended to add the parameter: -O (capital letter) to scp.

$ [LOCALUSER@my_computer ~]$ scp -O ./file.txt USERNAME@garnatxa.uv.es:./my_garnatxa_dir/

The above command copies file.txt from your computer to this path: /home/USERNAME/my_garnatxa_dir in garnatxa. Replace USERNAME by your user name in Garnatxa.

Copying it back is done with

$ [LOCALUSER@master ~]$ scp -O USERNAME@garnatxa.uv.es:./my_garnatxa_dir/file.txt .

This copy file.txt from garnatxa to your host.

If you want to copy a directory and its content, use the -r option, just like with scp.

$ [LOCALUSER@master ~]$ scp -O -r garnatxa.uv.es:/my_garnatxa_dir .

There is an alternative to synchronize a directory or individual files with Garnatxa. Use the rsync command if you want to copy only the new files and modifications from your source directory to the destination folder in Garnatxa.

$ [LOCALNAME@master ~]$ rsync --inplace --progress --partial --append -av ./my_local_directory USERNAME@garnatxa.uv.es:.

This copy only new and modified files from your local directory to the remote directory in garnatxa. With rsync you can restart an interrupted transference from the last copied file.

Split your files

If you have a very large file and you need to split the file in minor portions (in order to parallelize the execution of data) then use:

split --numeric-suffixes -l number_of_lines_to_split input_file output_prefix

Example: we have a fastq file with 14010000 lines and we want to split the file in 1000000 lines per file resulting 15 files:

$ [USERNAME@master data_red]$ split --numeric-suffixes -l 1000000 reads_00.fq subreads00_
$ [USERNAME@master data_red]$ ls subreads00_*
subreads00_00  subreads00_01  subreads00_02  subreads00_03  subreads00_04  subreads00_05  subreads00_06  subreads00_07  subreads00_08  subreads00_09  subreads00_10  subreads00_11  subreads00_12  subreads00_13  subreads00_14

Now you can do concurrent executions per file.

Tape library

Garnatxa provides tape storage service through a robotic tape library. The device is capable of keeping up online to 40 LTO 9 tapes of 16TB capacity. The tape life of an LTO tape is up to 30 years.

The main function of this device is to move storage data that must remain long-term from the shared storage in garnatxa to the tape device. Please note that if you exceed a certain disk quota in Garnatxa, administrators may inform you to delete disk space or move data to the tape device.

We also provide tape storage service for the storage of large databases, genomic datasets, etc.

How to request access to the tape system?

To use the tape device your group should: 1. Request the purchase of one or more tape. Each tape has an estimated price of about €100 which must be assumed by the group or external collaboration. Use the ticket platform (Garnatxa HPC item) in order to initialize the tape purchase: https://garnatxadoc.uv.es/support

2. Once the tape is available (we will inform your), the administrator will prepare the system so that only the group that owns the tape can access it. The tape is showed in the system as just another directory, i.e.: /tape2/GRP0000L9/

  1. The tape will be available for access for a certain time. After a group finishes storing/retrieving data on the tape it will be removed from the device to be guarded by the ownership group or in the garnatxa administrators.

What operations can be performed on the tape?

You must keep in mind that tape devices are sequential access storage media. This means that:

  1. Use rsync command to copy or read data from tape. Delete the garnatxa data once you verify that the transfer has been carried out correctly.

  2. Although you see the tape mounted on the system as just another data directory, the information you transfer will be stored continuously until the capacity is reached. If you delete files on the tape the size of the tape will not decrease although the deleted files will no longer appear on the tape. The only way to get full capacity back is to format the tape.

  3. Although it is possible to modify files directly on the tape directory, it is not advisable since this action introduces inefficiency in subsequent reading operations.

Copying files to tape

Once you receive permission to use the tape system, you must use the copytotape command in order to transfer files and directories to the tape. You can obtain more help about the sintaxis: copytotape -h

To use the copytotape you must be logged in the server: merlot.

Before copying a directory, check its size with the command: checkdiskspace. Tapes have a usable capacity of 16TB. Keep in mind that copytotape will search for a free tape among the available tapes where all the directory data can be stored. If there isn’t enough space on a single tape, the process will fail.

To check the available space for each tape use: copytotape -l

$ [USERNAME@master ~]$ ssh merlot

Each time the command copytotape is executed, a tape copy job is sent to the queue system. The progress of the copy task can be reviewed through a text file generated when the command is executed. Thus, it is possible to exit from Garnatxa and leave the copy process running on the cluster. Below you can see some examples of use:

Example: copy a single file to the tape.

$ [USERNAME@master ~]$ copytotape test/a.out
Submitted batch job 1481875
A new job has been submitted to the queue system (transferring 2 GB. to tape). You can check the progress of the copy checking the file: jobtape_20250507144350.out

Example: copy a complete directory to the tape.

$ [USERNAME@master ~]$ copytotape test/output_dir
Submitted batch job 1481876
A new job has been submitted to the queue system (transferring 28 GB. to tape). You can check the progress of the copy checking the file: jobtape_20250507144360.out

Example: Using wildcards in the source path.

$ [USERNAME@master ~]$ copytotape "test/out/*.sai"
Submitted batch job 1481877
A new job has been submitted to the queue system (transferring 28 GB. to tape). You can check the progress of the copy checking the file: jobtape_20250507144370.out

Warning

When you enter wildcards in the path you need to use quotes to enclose them.