Pitagora


Pitagora is the new EUROfusion supercomputer hosted by CINECA and currently built in the CINECA’s headquarter in Casalecchio di Reno, Bologna, Italy. The cluster is supplied by Lenovo corp. and is composed of two partitions: A general purpose partition cpu-based named DCPG and an accelerated partition based on NVIDIA H100 accelerators named Booster.
The specific guide for the Pitagora cluster contains unique information that deviates from the general behavior described in the HPC Clusters sections.
Access to the System
The machine is reachable via ssh
(secure Shell) protocol at hostname point: login.pitagora.cineca.it.
The connection is established, automatically, to one of the available login nodes. It is possible to connect to Pitagora using one the specific login hostname points:
login01-ext.pitagora.cineca.it
login02-ext.pitagora.cineca.it
login03-ext.pitagora.cineca.it
login04-ext.pitagora.cineca.it
login05-ext.pitagora.cineca.it
login06-ext.pitagora.cineca.it
Warning
The mandatory access to Pitagora is the two-factor authetication (2FA). Get more information at section Access to the Systems.
System Architecture
The system, supplied by Lenovo, is based on two new specifically-designed compute blades, which are available throught two distinct SLURM partitios on the Cluster:
GPU blade based on NVIDIA NVIDIA H100 accelerators - Booster partition.
CPU-only blade based on AMD Turin 128c processors - Data Centric General Purpose (DCGP) partition.
The overall system architecture uses NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability.
Hardware Details
Type |
Specific |
---|---|
Models |
Lenovo SD650-N V3 |
Racks |
7 |
Nodes |
168 |
Processors/node |
2x Intel Emerald Rapids 6548Y 32c 2.5 GHz |
CPU/node |
64 |
Accelerators/node |
4x NVIDIA H100 SXM 80GB HBM2e |
Local Storage/node (tmfs) |
|
RAM/node |
512 GiB DDR5 5600 Mhz |
Rmax |
27.27 PFlop/s (top500) |
Internal Network |
Nvidia ConnectX-6 Dx 100GbE |
Storage (raw capacity) |
2 x 7.68 GiB SSDs (HW RAID 1) |
Type |
Specific |
---|---|
Models |
Lenovo SD665 V3 |
Racks |
14 |
Nodes |
1008 |
Processors/node |
2x AMD Turin 128c - Zen5 microarch 2.3 GHz |
CPU/node |
256 |
Accelerators/node |
(none) |
Local Storage/node (tmfs) |
|
RAM/node |
768 GiB DDR5 6400 Mhz |
Rmax |
17 Pflop/s (top500) |
Internal Network |
Nvidia ConnectX-6 Dx 100GbE |
Storage (raw capacity) |
Job Managing and SLURM Partitions
In the following table you can find informations about the SLURM partitions for Booster and DCGP partitions.
See also
Further information about job submission are reported in the general section Scheduler and Job Submission.
Partition |
QOS |
#Nodes/#per job |
Walltime |
#Max Nodes/#per user |
Priority |
Notes |
---|---|---|---|---|---|---|
boost_fua_prod |
normal |
max = 16 |
24:00:00 |
32 |
40 |
|
boost_qos_fuadbg |
max = 2 |
00:10:00 |
2 |
80 |
||
boost_qos_fuaprod |
min = 17 (full nodes) max = 32 |
24:00:00 |
32 |
60 |
runs on 96 nodes (GrpTRES) |
|
boost_qos_fualprod |
max = 3 |
4-00:00:00 |
3 |
40 |
Partition |
QOS |
#Nodes/#per job |
Walltime |
#Max Nodes/#per user |
Priority |
Notes |
---|---|---|---|---|---|---|
dcgp_fua_prod |
normal |
max = 64 |
24:00:00 |
64 |
40 |
|
dcgp_qos_fuadbg |
max = 2 |
00:10:00 |
2 |
80 |
||
dcgp_qos_fuaprod |
min = 65 (full nodes) max = 128 |
24:00:00 |
128 |
60 |
runs on 640 nodes (GrpTRES) |
|
boost_qos_fualprod |
max = 6 |
4-00:00:00 |
12 |
40 |