Get started today

The NICA LHEP offline cluster

1 – User registration

A user must be registered to access the computing resources of the off-line NICA cluster. To get an account on the cluster, you must fill out the registration form at registration page. The username and password will be sent to your e-mail address. Help with registration can be obtained from the experiment software development coordinators:

  • BM@N – Konstantin Gertsenberger <gertsen@jinr.ru
  • MPD – Andrey Moshkin <amoshkin@jinr.ru>
  • SPD – Alexey Zhemchugov <zhemchugov@jinr.ru>
  • General cluster questions – Ivan Slepov <slepov@jinr.ru>

2 – The rules of using the cluster

  • Accounts, whose users have not worked on the cluster for more than 3 years, are deleted.
  • Logging on to the cluster is allowed only using the SSH protocol.
  • Interactive work is allowed only on ncx machines [101-106] .jinr.ru
  • Interactive work is when the launched processes consume no more than 100% of the power of one processor core for no more than 15 minutes continuously. If not, then all processes launched by the user in interactive mode will be stopped.
  • Long-term calculations are only allowed in BATCH mode with launch via SGE (Sun Grid Engine), a batch processing system for managing the resources of a high-performance computing cluster.

3 – Interactive work on the cluster

To login, use your account for ncx machines (ncx[101-106].jinr.ru), e.g.:

ssh -X [username]@ncx.jinr.ru  # regular login

ssh -X [username]@ncx102.jinr.ru  # login to a specific machine

4 – Software installed on cluster nodes

  • CentOS Linux operating system
  • Graphical shell GNOME on Xorg
  • GCC: gcc (version 4.8.5)
  • C ++: g ++ (version 4.8.5)
  • FC: g77 and gfortnan – GNU Fortran (version 4.8.5)
  • PERL: perl (version 5.10.1)
  • PYTHON: python (version 2.6.6)
  • JAVA: java (version 1.8.0)
  • Text editors: vi / vim, emacs, nano
  • Debuggers: ddd, gdbm, jdb
  • Build tools: make, gmake, cmake, imake
  • Internet utilities: firefox, lynx, pine, ssh, ftp, etc.
  • SQL clients for working with MySQL and PostgreSQL databases
  • FairSoft and FairRoot software are installed locally on each NCX machine. The current working version of FairSoft is located in at/opt/fairsoft/<exprmnt>/pro directory, FairRoot –/opt/fairroot/<exprmnt>/pro (where <exprmnt> is one of bmn, mpd, spd). Please, use these official supported versions on the cluster.

5 – The cluster storage system

  • Home directory space /lhep/users/[username] does not exceed 50 GB!
  • To work with a big amount of data files, users can use the EOS storage (EOS documentation).
  • EOS file system user data /eos/nica/<exprmnt>/users/[username] (where <exprmnt> = bmn or mpd or spd). To get the required EOS disk space, contact to the experiment software coordinator.
  • EOS file system experimental and simulated experimental data /eos/nica/<exprmnt>/exp /eos/nica/<exprmnt>/sim (where <exprmnt> = bmn or mpd or spd). To work with EOS data, it is preferable to use XRootD protocol from any remote place, e.g. xrdcp command – to copy EOS files or root://ncm.jinr.ru/ prefix – to open EOS files in the ROOT environment (like root://ncm.jinr.ru//eos/nica/<exprmnt>/exp/...).
  • /tmp – is a ultra-fast storage system for intermediate calculation data. Don’t forget to delete your files after the end of the task.

Examples of copying files:

eoscp /tmp/{$pwd}/<file> /eos/nica/<some_pth>/<file>

xrdcp -f <file> root://ncm.jinr.ru//eos/nica/<exprmnt>/<file>"?xrd.wantprot=gsi,unix"

Shell command to help with the EOS file system: eos help

6 – SGE batch processing system

If you have time-consuming tasks, many simple tasks or a lot of files to process, you can use batch system Sun Grid Engine (SGE) of the cluster. If you know how to work with SGE (SGE man pages), you can use qsub command on the cluster to distribute data processing. You can also download SGE user manual.

Starting a task in batch mode:

cd <workdir>

qsub -cwd script.sh

Examples of basic commands for viewing the status of all cluster nodes, running tasks and errors in their execution:

qstat -f  # shows ALL nodes

qstat -f -u [username] # shows all tasks

qstat -j {jobid} | grep error  # shows errors

Click here to view examples of start SGE scripts for the cluster.

Include in the executable script a string (or part of it) with an explicit indication of the nodes on which the tasks will be performed:

#$ -l h=(ncx20[5-8]|ncx21[1-8]|ncx22[5-8]|ncx23[1-8]|ncx11[1-7]|ncx12[1-7]|ncx13[1-9]|ncx14[1-9]|ncx15[19]|ncx16[1-9])

If possible, DO NOT write from batch tasks directly to EOS. Use scratch (/ weekly) and then a separate command, copy the data with the eoscp or xrdcp commands.

Access to data and programs in batch mode:

  • / tmp – it is necessary to rewrite to each batch machine and delete files at the end of the task.
  • Scratch (/ weekly) – mounted on all machines.
  • GlusterFS – disks (/ mpd19, / mpd20, / mpd21, / mpd22, / bmn1, / spd1) – mounted on all machines.
  • EOS – mounted on all machines.

Shell command to help with the batch SGE system: man sge_intro

NICA-Scheduler has been developed to simplify running of user tasks without knowledge of SGE. You can find how to use NICA-Scheduler here.

7 – Support

About experiment software questions:

  • BM@N – Konstantin Gertsenberger <gertsen@jinr.ru
  • MPD – Andrey Moshkin <amoshkin@jinr.ru>
  • SPD – Alexey Zhemchugov <zhemchugov@jinr.ru>, Oleynik Danila <danila@jinr.ru>

General cluster software questions:

  • NCX cluster support – <ncx@jinr.ru>
  • Ivan Slepov <slepov@jinr.ru>

Also if you have a problem you can use Q&A Forum or Submit ticket