FAQ
This is a page collecting answers to requests arrived to the HPC Helpdesk. Please check the page before sending a specific request.
General
I still didn’t receive the username and the link for the 2FA configuration?
You have to do the complete registration on the UserDB page and to be associated with a project (PI has to add you). Once you have inserted all the necessary information and you are associated with a project a new access button will appear, just click on it and you will receive in two mails the username and the link for the 2FA configuration.
I have finished my budget but my project is still active, how can I do?
Non-expired projects with exhausted budgets may be allowed to keep using the computational resources at the cost of minimal priority. Ask superc@cineca.it to motivate your request and, in case of a positive evaluation, you will be enabled to use the qos_lowprio QOS.
Information about my project on CINECA clusters (end data, total end monthly amount of hours, the usage?)
You can list all the Accounts attached to your username on the current cluster, together with the “budget” and the consumed resources, with the command:
> saldo -bFind more information in Data Occupancy Monitoring Tools section.
Access and Login
My new password isn’t accepted, with error “Could not execute the password modify extended operation for DN”
The error message can be difficult to interpret, but it means that the new password you have chosen does not respect our password policies. Please check the Users and Accounts and choose your new password accordingly.
I receive the error message “WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!” when trying to login
The problem may happen because we have reinstalled the login node changing the fingerprint. We should have informed you through an HPC-news. If this is the case you can remove the old fingerprint from your known_hosts file with the command
ssh-keygen -f "~/.ssh/known_hosts" -R "login.<cluster_name>.cineca.it"I keep receiving the error message “WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!” even if I modify known_host file
Please, follow the procedure described below to solve the problem.
# LEONARDO mkdir -p ~/.ssh; ssh-keygen -f ~/.ssh/known_hosts -R login.leonardo.cineca.it; for address in login01-ext.leonardo.cineca.it login02-ext.leonardo.cineca.it login05-ext.leonardo.cineca.it login07-ext.leonardo.cineca.it; do ssh-keygen -f ~/.ssh/known_hosts -R $address; done; for keyal in rsa ecdsa ed25519 dsa; do for address in login01-ext.leonardo.cineca.it login02-ext.leonardo.cineca.it login05-ext.leonardo.cineca.it login07-ext.leonardo.cineca.it; do ssh-keyscan -t ${keyal} ${address} | sed "s/${address}/login*.leonardo.cineca.it/g" >> ~/.ssh/known_hosts; done; done# G100 mkdir -p ~/.ssh; ssh-keygen -f ~/.ssh/known_hosts -R login.g100.cineca.it; for address in login01-ext.g100.cineca.it login02-ext.g100.cineca.it login03-ext.g100.cineca.it; do ssh-keygen -f ~/.ssh/known_hosts -R $address; done; for keyal in rsa ecdsa ed25519 dsa; do for address in login01-ext.g100.cineca.it login02-ext.g100.cineca.it login03-ext.g100.cineca.it; do ssh-keyscan -t ${keyal} ${address} | sed "s/${address}/login*.g100.cineca.it/g" >> ~/.ssh/known_hosts; done; done# PITAGORA mkdir -p ~/.ssh; ssh-keygen -f ~/.ssh/known_hosts -R login.pitagora.cineca.it; for address in login01-ext.pitagora.cineca.it login02-ext.pitagora.cineca.it login03-ext.pitagora.cineca.it login04-ext.pitagora.cineca.it login05-ext.pitagora.cineca.it login06-ext.pitagora.cineca.it; do ssh-keygen -f ~/.ssh/known_hosts -R $address; done; for keyal in rsa ecdsa ed25519 dsa; do for address in login01-ext.pitagora.cineca.it login02-ext.pitagora.cineca.it login03-ext.pitagora.cineca.it login04-ext.pitagora.cineca.it login05-ext.pitagora.cineca.it login06-ext.pitagora.cineca.it; do ssh-keyscan -t ${keyal} ${address} | sed "s/${address}/login*.pitagora.cineca.it/g" >> ~/.ssh/known_hosts; done; done# LEONARDO mkdir -Force $HOME\.ssh\known_hosts; ssh-keygen -f "$HOME\.ssh\known_hosts" -R "login.leonardo.cineca.it"; foreach ($address in "login01-ext.leonardo.cineca.it", "login02-ext.leonardo.cineca.it", "login05-ext.leonardo.cineca.it", "login07-ext.leonardo.cineca.it") { ssh-keygen -f "$HOME\.ssh\known_hosts" -R $address }; foreach ($keyal in "rsa", "ecdsa", "ed25519", "dsa") { foreach ($address in "login01-ext.leonardo.cineca.it", "login02-ext.leonardo.cineca.it", "login05-ext.leonardo.cineca.it", "login07-ext.leonardo.cineca.it") { ssh-keyscan -t $keyal $address | ForEach-Object { $_ -replace "$address", "login*.leonardo.cineca.it" } | Out-File -Encoding UTF8 -Append "$HOME\.ssh\known_hosts" } }# G100 mkdir -Force $HOME\.ssh\known_hosts; ssh-keygen -f "$HOME\.ssh\known_hosts" -R "login.g100.cineca.it"; foreach ($address in "login01-ext.g100.cineca.it", "login02-ext.g100.cineca.it", "login03-ext.g100.cineca.it") { ssh-keygen -f "$HOME\.ssh\known_hosts" -R $address }; foreach ($keyal in "rsa", "ecdsa", "ed25519", "dsa") { foreach ($address in "login01-ext.g100.cineca.it", "login02-ext.g100.cineca.it", "login03-ext.g100.cineca.it") { ssh-keyscan -t $keyal $address | ForEach-Object { $_ -replace "$address", "login*.g100.cineca.it" } | Out-File -Encoding UTF8 -Append "$HOME\.ssh\known_hosts" } }# PITAGORA mkdir -Force $HOME\.ssh\known_hosts; ssh-keygen -f "$HOME\.ssh\known_hosts" -R "login.pitagora.cineca.it"; foreach ($address in "login01-ext.pitagora.cineca.it", "login02-ext.pitagora.cineca.it", "login03-ext.pitagora.cineca.it", "login04-ext.pitagora.cineca.it", "login05-ext.pitagora.cineca.it", "login06-ext.pitagora.cineca.it") { ssh-keygen -f "$HOME\.ssh\known_hosts" -R $address }; foreach ($keyal in "rsa", "ecdsa", "ed25519", "dsa") { foreach ($address in "login01-ext.pitagora.cineca.it", "login02-ext.pitagora.cineca.it", "login03-ext.pitagora.cineca.it", "login04-ext.pitagora.cineca.it", "login05-ext.pitagora.cineca.it", "login06-ext.pitagora.cineca.it") { ssh-keyscan -t $keyal $address | ForEach-Object { $_ -replace "$address", "login*.pitagora.cineca.it" } | Out-File -Encoding UTF8 -Append "$HOME\.ssh\known_hosts" } }Windows WSL issue DNS resolution failing
If the DNS resolution fails with Temporary failure in name resolution or resolution timing out, an automatic change in
/etc/resolv.confoccured. You can change it manually by replacing the name server value with 8.8.8.8 . This file is automatically generated by WSL: to stop the automatic generation of this file, add the following entry to /etc/wsl.conf: [network] generateResolvConf = false.Then, add in your
.bashrcthe following commands for the automatic creation of the name server value in theresolv.conffile:if [ ! -f /etc/resolv.conf ]; then echo "nameserver 8.8.8.8" > /etc/resolv.conf fiThe message “perl: warning: Setting locale failed” appear when I login. How do I solve?
This warning is typical of Mac users (but can happen with other OS too). It is actually innocuous and can be safely ignored, but if you want to get rid of it you can add these lines to the
.bashrcof your workstation, or in general you can execute them before trying to login to our systems:export LANGUAGE=en_US.UTF-8 export LANG=en_US.UTF-8 export LC_ALL=en_US.UTF-8if you try to login afterwards, the warnings should have disappeared.
Can I login with ssh inside a compute node?
In general is forbidden to login via ssh in a compute node, as the network is local-only restricted, It is always possible to access to compute node by submitting an interactive job with
srunorsalloc(check the Interactive Job Submission with SLURM section), but apart from that,sshis only possible if there is a running job from the user that want to login into compute node.Specifically, in Leonardo, due to recent system updates, the connection via
sshcan be denied even for that specific case described before. To avoid such limitation is now mandatory to have a pair of SSH keys generated inside Leonardo cluster.The user can create the keys with the command
ssh-keygenand copy the output , the public key, in their configuration file~/.ssh/autorized_keys. After the generation of SSH Keys, the connection viasshin a compute node will be possible (with the condition of a job running from the user that whant to connect to compute node).
2FA
ERROR: The term ‘step’ is not recognized as the name of a cmdlet (Powershell)
If running the command step to verify the installation of smallstep you incour in the following error:
PS C:\Users\user > step step : The term 'step' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. At line:1 char:1 + step + ~~~~ + CategoryInfo : ObjectNotFound: (step:String) [], ParentContainsErrorRecordException + FullyQualifiedErrorId : CommandNotFoundExceptioncheck if you find the executable step.exe inside the folder:
C:\Users\user\scoop\shimsThe installation command should have placed it there. If you don’t find it, run on your Powershell the command:
scoop install smallstep/stepERROR associated to X11 execution
install Xming: https://sourceforge.net/projects/xming/, it will open a window in the background that you won’t be able to see but you can see that it’s there looking between the icons in the Windows’ applications bar (bottom right)
follow the steps reported at https://x410.dev/cookbook/built-in-ssh-x11-forwarding-in-powershell-or-windows-command-prompt/ for PowerShell
then you can run the command ssh to login on the cluster
undefined method ‘cellar’ when installing step on MacOS
You may encounter an error that looks like this:
Error: step: undefined method
cellar for #<BottleSpecification:0x000000012e579660>In this case, the problem is in your homebrew. It may refer to the directories of different processes, e.g. Intel, while you need to make it refer to AMD. You can reinstall homebrew:
brew tap homebrew/core and then set the proper paths with simple shell commands:
(echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') >> /Users/<my_user>/.zprofile eval "$(/opt/homebrew/bin/brew shellenv)"If this is not the solution of your error, the command brew doctor should give you an hint about how to proceed in your specific case.
Scheduler and Job Execution
My job has been waiting for a long time
The priorities in the queue are composed of several factors and the value may change due to the presence of other jobs, of the resources required, and your priority. You can see the reason for your job in the queue with the squeue command. If the state is PD, the job is pending. Some reasons for the pending state that could bee displayed:
Priority= The job is waiting for free resources.
Dependency= It is depending to the end of another job.
QOSMaxJobsPerUserLimit = The maximum number of jobs a user can have running at a given time.
You can also consult the estimated starting run time with the SLURM command
scontrol:scontrol show job #JOBIDor you can see the priority of your job with the sprio SLURM command:
sprio -j #JOBIDCan I modify SLURM settings of a waiting job?
Some Slurm settings of a pending job can be modified using the command scontrol update. For example, setting the new job name and time limit of the pending job:
scontrol update JobId=2543 Name=newtest TimeLimit=00:10:00How can I place and release a job from hold state?
In order to place a job on hold type:
scontrol hold JOB_ID.To release the job from the hold state, issue:
scontrol release JOB_ID.Error invalid account when submitting a job: Invalid account or account/partition combination specified
The error Invalid account might depend on the lack of resources associated to your project or there is an error with the account name in your batch script. Just use the
saldocommand. If the account is correct and valid, are you lunching the job on the right partition? To see which partition is right for your case and account, please consult the Scheduler and Job Submission section.I get the following message: “srun: Warning: can’t honor –ntasks-per-node set to xx which doesn’t match the requested tasks yy with the number of requested nodes yy. Ignoring –ntasks-per-node.” What does it mean?
This is a message that can appear when using mpirun and Intelmpi parallel environment. It is a known problem that can be safely ignored,since mpirun does not read the proper Slurm variables and thinks that the environment is not set properly, thus generating the warning: in reality, the instance of srun behind it will respect the setting you requested with your Slurm directives. While there are workarounds for this, the best solution (apart from ignoring the message) is to use srun instead of mpirun: with this command, the Slurm environment is read properly and the warning does not appear.
HPC Cloud
I can’t SSH to my virtual machine
The most common reasons for not being able to login to your VM are related to:
SSH command: be sure to use the correct username, address and access key
Floating IP: If no FIP is associated to your VM, it is not possible to reach it. (see Allocate a floating IP)
Security Rules: In the “Overview” tab, under the section “Security Group” check that the rule for port 22 is defined, and what ranges of IP is allowed to access. (see Security groups: create)
Network issues: Check that Network, Subnet and Router are set up properly.
Machine Boot: Check if the VM booted correctly. On the Horizon Dashboard, enter the VM page and check the “Console” tab. If an error message appears related to “kernel panic” or “no bootable device”, then the problem lies on either the specific image or the bootable device used. If no error appears, then check the full Boot Log in the tab “Log”. Depending on the error found, could be necessary to perform the rescue of the VM (see Instance: rescue)
I can’t create an OpenStack resource
The main reason a user is blocked when creating new resources (virtual machines, volumes, etc.) is that they have reached their project quota. If you need to increase your project quota please contact our user support.
If I resize my virtual machine, will I lose my data?
No, you won’t lose your data, but you will have to re-partition your disk to use the space you added with the resize operation.
Can I create a virtual machine using a Windows image?
No, on CINECA HPC Cloud systems users are not allowed to upload and/or use Windows images, even if they own a personal license.