Pitagora
Pitagora is the new EUROfusion supercomputer hosted by CINECA and currently built in the CINECA’s headquarter in Casalecchio di Reno, Bologna, Italy. The cluster is supplied by Lenovo corp. and is composed of two partitions: a general purpose partition cpu-based named DCPG and an accelerated partition based on NVIDIA H100 accelerators named Booster.
The specific guide for the Pitagora cluster contains unique information that deviates from the general behavior described in the HPC Clusters sections.
Access to the System
The machine is reachable via ssh (secure Shell) protocol at hostname point: login.pitagora.cineca.it.
The connection is established, automatically, to one of the available login nodes. It is possible to connect to Pitagora using one the specific login hostname points:
login01-ext.pitagora.cineca.it
login02-ext.pitagora.cineca.it
login03-ext.pitagora.cineca.it
login04-ext.pitagora.cineca.it
login05-ext.pitagora.cineca.it
login06-ext.pitagora.cineca.it
Warning
The mandatory access to Pitagora is the two-factor authetication (2FA). Get more information at section Access to the Systems.
Note
Even-numbered login nodes have the same architecture of Booster parition’s compute nodes while odd-numbered have the same architecture of DCGP parition’s compute nodes
login-boost.pitagora.cineca.it will allow users to log on one of the even-numbered login nodes in a round robin fashion.
login-dcgp.pitagora.cineca.it will allow users to log on one of the odd-numbered login nodes in a round robin fashion.
System Architecture
The system, supplied by Lenovo, is based on two new specifically-designed compute blades, which are available throught two distinct SLURM partitios on the Cluster:
GPU blade based on NVIDIA NVIDIA H100 accelerators - Booster partition.
CPU-only blade based on AMD Turin 128c processors - Data Centric General Purpose (DCGP) partition.
The overall system architecture uses NVIDIA Mellanox InfiniBand High Data Rate (HDR) connectivity, with smart in-network computing acceleration engines that enable extremely low latency and high data throughput to provide the highest AI and HPC application performance and scalability.
Hardware Details
Type |
Specific |
|---|---|
Models |
Lenovo SD650-N V3 |
Racks |
7 |
Nodes |
168 |
Processors/node |
2x Intel Emerald Rapids 6548Y 32c 2.4 GHz |
CPU/node |
64 |
Accelerators/node |
4x NVIDIA H100 SXM 80GB HBM2e |
Local Storage/node (tmfs) |
|
RAM/node |
512 GiB DDR5 5600 Mhz |
Rmax |
27.27 PFlop/s (top500) |
Internal Network |
Nvidia ConnectX-7 NDR200 |
Storage (raw capacity) |
2 x 7.68 GiB SSDs (HW RAID 1) |
Type |
Specific |
|---|---|
Models |
Lenovo SD665 V3 |
Racks |
14 |
Nodes |
1008 |
Processors/node |
2x AMD Turin 128c - Zen5 microarch 2.4 GHz |
CPU/node |
256 |
Accelerators/node |
(none) |
Local Storage/node (tmfs) |
|
RAM/node |
768 GiB DDR5 6400 Mhz |
Rmax |
17 Pflop/s (top500) |
Internal Network |
Nvidia ConnectX-7 NDR SharedIO 200Gbit/s |
Storage (raw capacity) |
Diskless nodes |
Job Managing and SLURM Partitions
In the following table you can find informations about the SLURM partitions for Booster and DCGP partitions of the production environment. Please note that the slurm email service is not active yet.
See also
Further information about job submission are reported in the general section Scheduler and Job Submission.
Partition |
QOS |
#Nodes/#per job |
Walltime |
#Max Nodes/#per user |
Priority |
Notes |
|---|---|---|---|---|---|---|
boost_fua_prod |
normal |
max = 16 |
24:00:00 |
32 |
40 |
|
boost_qos_fuabprod |
min = 17 (full nodes) max = 32 |
24:00:00 |
32 |
60 |
runs on 96 nodes (GrpTRES) |
|
boost_fua_dbg |
normal |
max = 2 |
00:30:00 |
2 |
40 |
runs on 8 nodes (GrpTRES) |
Partition |
QOS |
#Nodes/#per job |
Walltime |
#Max Nodes/#per user |
Priority |
Notes |
|---|---|---|---|---|---|---|
dcgp_fua_prod |
normal |
max = 64 |
24:00:00 |
64 |
40 |
|
dcgp_qos_fuabprod |
min = 65 (full nodes) max = 128 |
24:00:00 |
128 |
60 |
runs on 640 nodes (GrpTRES) |
|
dcgp_qos_fualprod |
max = 3 |
4-00:00:00 |
3 |
40 |
||
dcgp_fua_dbg |
normal |
max = 2 |
00:30:00 |
2 |
40 |
runs on 8 nodes (GrpTRES) |
Processes/Threads Binding/Affinity
Processes Binding
By default, srun (SLURM launcher) performs an automatic binding. For multi-threaded application request the proper –cpus-per-task and bind the processes to cores (srun –cpu-bind=cores).
By default, OpenMPI libraries (mpirun launcher) bind processes to core. For multi-threaded applications this causes the cpu overallocation. Ensure that you are either not bound at all (by specifying –bind-to none) or bound to multiple cores using an appropriate binding level or specific number of processing elements per application process (–map-by socket:PE=$SLURM_CPUS_PER_NODE).
By default, IntelMPI libraries (mpirun launcher with hydra process manager) performs a correct binding. If you opt for IntelMPI mpirun as launcher, unset the I_MPI_PMI_LIBRARY (meant for using Intelmpi with srun) defined when loading the module to avoid the verbose warnings.
Threads Affinity
All present compilers (gcc, nvhpc, aocc, intel) by default don’t bind threads to cores. You can act on the threads affinity with the standard OMP_PLACES/OMP_PROC_BIND variables.
Known Issues
This section collects currently known issues affecting PITAGORA.
The list below is intended as a quick reference for users who may experience problems on the system. We strongly encourage all users to report any issues they encounter - whether listed here or not - to the user support team.