Hardware Specifications
The Garnatxa HPC cluster is installed in the I2SysBio’s Data Center. The I2SysBio’s Data Center has a refrigerated room which is enabled to contain a maximum capacity of 12 racks. The racks have redundant components (PDUs, InRows, power supplies, etc) and the Data Center has your own cooling machine other than that provided by the I2SysBio building. The UPS equipment is installed in an annexed rood. The UPS provides to the Data Center a pair of redundant energy lines. It is enabled to provide about 125kVA of energy and a set of batteries provides about 2 hours of autonomy with high load.
The network infrastructure is provided by multiple CISCO devices (Nexus 5696Q, Nexus 2348UPQ). All the racks are inter-connected through redundant lines of 40Gb/s bandwidth. The storage data and the computing networks are isolated from management and service networks. Every network is provided by redundant and dedicated equipment producing an aggregated bandwidth of 50 Gb/s (storage) and 20Gb/s (computing)
The Data Center is currently housing six racks(48Us) and six inRow (cooling devices). A hot aisle is used to containing the hot air further reduces any chance of hot and cold air streams mixing.
The Garnatxa HPC cluster comprises nine computing nodes used for calculation and storage tasks and one head node which performs cluster management functions. Additional services like VPN, LDAP, DHCP and DNS are provided by two redundant nodes. Also two more nodes are used to provide a private cloud infrastructure.
All nodes are inter-connected with 2 x 25Gb/s = 50Gb/s network for storage capabilities and 2 x 10Gb/E = 20Gb/s network for computing tasks (both networks are configured in aggregate bonding mode). Each compute node is equipped with two x86_64 architecture processors and 1500TB of RAM per node. Computing nodes in Garnatxa enables about 608 CPU cores (1216 hyper-threads) and 15TB of RAM. All the cluster provides a peak performance of ~7 Tflops/s. One network-attached is dedicated to store data in a distributed file system. This storage system has a capacity of 3.5PB and is provided by CEPH. The system is configured through several pools of data. Three nodes are used as MDS and six cabins of disks are configured to provide redundant access to the data. In total around 332 disks are used to store data, for tasks with intensive I/O, a set of special NVMe and SSD disks are available.
Also, Garnatxa provides a robotic tape library. The device is capable of keeping up to 40 LTO 9 tapes of 16TB capacity online. The tape life of an LTO tape is up to 30 years. The main function of this device is to move storage data that must remain long-term from the shared storage in garnatxa to the tape device. We also provide tape storage service for the storage of large databases, genomic datasets, etc.
Users can connect to a separate login node which facilitates a work environment (through an module management like Lmod) and several tools for running jobs in the cluster. Job management is provided by the SLURM system.
General System Parameters
HPC Cluster Configuration |
|
---|---|
OS |
Rocky Linux release 8.10 (Green Obsidian) |
Nodes |
23 ( 13 computing + 2 front-end + 3 storage management + 2 service + 3 VM ) |
Peak performance |
7 Tflops/s (HPLinpack 2.3) |
CPU for computing |
608 CPU cores (1216 hyper-threading) |
CPU for services |
96 CPU cores (192 hyper-threading) |
CPU for virtualization |
144 CPU cores (288 hyper-threading) |
Memory for computing |
15 TB |
Memory for services |
4 TB |
Memory for virtualization |
1.5 TB |
Network for computing |
20Gb/s |
Network for storage |
50Gb/s |
Storage system |
Ceph Reef 18.2.1 |
Storage nodes |
8 JBOD storage cabins (multipath access) |
Storage capacity |
3.5PB ( 332 disks = 228 hdd + 104 SSD/NVMe ) |
Storage tape capacity |
720TB IBM TS4300 2 drives (40 online LTO9 18 TBtapes) |
UPS & Cooling System |
|
---|---|
UPS model |
APC Symmetra PX 125kW Scalable to 500kW |
Cooling system |
6 x APC ACRC301S Chilled Water InRow |