EFGW Gateway

EFGW is the new EUROfusion Gateway system hosted by CINECA in the headquarter Casalecchio di Reno, Bologna, Italy. The cluster is supplied by Lenovo Corp. and is equipped with 15 AMD nodes, including 4 nodes with fat memory (1.5 TB), and 1 node with 4 H100 GPUs and local SSD storage.

How to get a User Account

Users of the old EUROfusion Gateway were migrated on the new system with the same username.

For new users: to get access to EFGW the following steps are required:

  • Register on UserDB portal

  • complete the registration filling your affiliation in the Institution page and uploading a valid Identity Document in Documents for HPC page.

  • Download the Gateway User Agreement (GUA)

  • Fill and sign the GUA, send it via email to EUROfusion Coordination Officer Denis Kalupin (Denis.Kalupin-at-euro-fusion.org)

Access to the System

The machine is reachable via ssh (secure Shell) protocol at hostname point: login.eufus.eu.

The connection is established, automatically, to one of the available login nodes. It is also possible to connect to EFGW using one the specific login hostname points:

  • viz05-ext.efgw.cineca.it

  • viz06-ext.efgw.cineca.it

  • viz07-ext.efgw.cineca.it

  • viz08-ext.efgw.cineca.it

Each login node is equipped with two AMD EPIC 9254 24-Core Processors. An alias hostname pointing to all the login nodes in a round-robin fashion, will be set-up in the next weeks.

Warning

The mandatory access to EFGW is the two-factor authentication (2FA) via the dedicated provisioner efgw. Get more information at section Access to the Systems.

Please note: EFGW users have to obtain the ssh certificate from the efgw provisioner.

How to access EFGW with NX

  1. Get a ssh key with step and put it in your $HOME/.ssh folder

    step ssh certificate 'username' --provisioner efgw ~/.ssh/gw_key
    
  2. Configure a NX session as follows:

You can freely install software on your NX desktop using the “flatpak” package. Please refer to the official documentation for instructions on how to use it.

Please keep in mind that you need to use it with the –user flag.

System Architecture

The cluster, supplied by Lenovo, is based on AMD processors:

  • 10 nodes with two AMD EPYC 9745 128-Core Processors and 738 GB DDR5 RAM per node

  • 4 nodes with two AMD EPYC 9745 128-Core Processors and 1511 GB DDR5 RAM per node

  • 1 node with two AMD EPIC 9354 32-Core Processors, 4 H100 GPUs, and 738 GB DDR5 RAM per node

File Systems and Data Management

The storage organization conforms to CINECA infrastructure. General information are reported in File Systems and Data Management section. In the following, only differences with respect to general behavior are listed and explained.

The storage is organized as a replica of the previous Gateway cluster with the data of /afs and /pfs areas copied on the new lustre storage system (no afs available, only the data were copied). Please notice that the path /gss_efgw_work, linked to the /pfs areas on the old Gateway, does not exist on the new Gateway.

The TMPDIR is defined:

  • on the local SSD disks on login nodes (2.5 TB of capacity), mounted as /scratch_local (TMPDIR=/scratch_local). This is a shared area with no quota, remove all the files once they are not requested anymore. A cleaning procedure will be enforced in case of improper use of the area.

  • on the local SSD disk on the GPU node (850 GB of capacity, default size 10 GB )

  • on RAM on all the 14 cpu-only, diskless compute nodes (with a fixed size of 10 GB)

On the GPU node, a larger local TMPDIR area can be requested, if needed, with the slurm directive:

$ SBATCH --gres=tmpfs:XXG

up to a maximum of 212.5 GB.

Environment and Customization

The main tools and compilers are available through the module command when logging into the cluster:

$ module av

To have all the modules of aocc, gcc, and OneAPI stacks installed from CINECA staff, you need to load the “cineca-modules” module and execute “module av” command:

$ module load cineca-modules
$ module av

For getting information about “module” usage, compilers, and mpi libraries you can consult the The module command and Compilers

You can install any additional software you may need with flatpack or SPACK .

How to make your $HOME/public open to all users

In order to configure on the new EFGW your $HOME/public as it was on the old EFGW afs filesystem, please add the proper ACL to your $HOME directory as follows:

$ setfacl -m g:g2:x $HOME

Job Managing and Slurm Partitions

In the following table you can find informations about the Slurm partitions on the EFGW cluster.

Partition

QOS

#Cores per job

Walltime

Max jobs/res. per user

Max memory per node

Priority

Notes

gw

noQOS

max=768 cores

48:00:00

2000 submitted jobs

735 GB / 1511 GB

40

Four fat memory nodes

qos_dbg

max=128 cores

00:30:00

Max 128 cores

735 GB / 1511 GB

80

Can run on max 128 cores

qos_gwlong

max=256 cores

144:00:00

Max 128 cores, 2 running jobs

735 GB / 1511 GB

40

Four fat memory nodes

gwgpu

noQOS

max=16 cores/1 gpu / 188250 MB

08:00:00

1 running job

735 GB

40

Note

In the new Gateway the debug partition has been replaces by a QoS.

How to request support

Please write a mail to superc@cineca.it specifying EFGW in the Subject.