Here is a list of frequently asked, or otherwise useful questions. If you have a question you'd like to see here, please let us know. For COSMA8-specific questions, please see the COSMA8 FAQ
Please direct questions to cosma-support@durham.ac.uk. Explain what you are trying to do or would like to do, where you are trying to do it, and provide your COSMA username.
For technical problems, please email cosma-support@durham.ac.uk. If we cannot solve your problems, we will then try to find people who can.
Any STFC theory community member, or any ICC collaborator. To request DiRAC funded time, either submit to the standard call for proposals, or submit a seedcorn application. See the DiRAC website for further details. For non-DiRAC time, speak to your collaborator at Durham, CC-ing cosma-support@durham.ac.uk.
Follow these instructions.
Yes, provided that they are a collaborator with the ICC
Follow these instructions. This will work both from within and outside Durham University.
VNC is not recommended for security reasons. However, x2go offers a faster graphical interface.
Please see here.
If you are working on a DiRAC project, you should use COSMA6, COSMA7 or COSMA8, depending on which project you are working. You need to be a member of the relevant Operating System Group (use the "id" command on a login node), to submit to the corresponding COSMA.
If you are not part of a DiRAC project, you should use COSMA5. You will need to be part of the "durham" group to do this.
We have a Globus Online endpoint, cosma#data. Please create an account, and use this.
For intermediate sized files, you can always use scp or rsync.
/cosma/home/PROJECT/USERNAME (where PROJECT will be durham or a DiRAC project identifier)
You will also have data storage space, typically in one of:
Yes, these should be set. You have a quota both on total storage capacity (typically 10TB for data and 10GB for homespace), and total number of files used.
On a login node, use the "quota" command. This should report all your quotas on the different file systems (/cosma5, /cosma6, /cosma7 and homespace). This is part of the cosma module, so you may need to "module load cosma" first.
Alternatively, on the appropriate login node, use the c5quota (which no longer works), c6quota, or c7quota commands. You will need the cosma module loaded. To see your homespace quota, use the "quota" command.
Homespace is backed up every few days. Data space can be archived to tape media upon request.
This is not currently possible, though if you are in Durham, you can email a PDF to the CIS printers using the "mail" command.
Modules are used to setup your work environment, changing your environment variables (e.g. PATH, etc) depending on what your requirements are.
Use the "modules available" command to see available modules. These are the same for each COSMA system, and if optimised versions exist for particular architectures, these may be loaded automatically.
Please see the codes section of the site for specific code details.
Well written Makefiles should just work. However, badly written Makefiles may need changing to make use of the environment variables created by loading modules. The "export" command will show you the currently set environment variables.
Put "module load MODULE" commands within your .bash_profile script (or .login if you are an older user)
It is recommended to speak to cosma-support@durham.ac.uk first, so that we are aware of your requirements. We may also be able to make recommendations.
See here.
COSMA nodes are exclusive - you will not share nodes with other users. However, this does mean you should make good use of all the available cores.
If you are using a single core job, consider running multiple of these at the same time, using SLURM arrays, or a parallel MPI launch.
COSMA5 has about 5000
COSMA6 has about 9000 cores.
COSMA7 has 12,656 cores.
COSMA8 has 46,080 cores.
The SLURM job submission system has a fair scheduler, based on how many jobs you have previously submitted, how busy the queues are, and how large your job is. If you specify a job with a shorter run-time, it is more likely to be scheduled more quickly to fill in space.
For job control, please see here.
You will have space in /cosma[5,6,7,8]/data/PROJECT/USERNAME
This data is not backed up, but operates on a maintained parallel storage system (i.e. individual disk failures do not lead to data loss), and is archived periodically.
Each storage location is optimally connected to that system. e.g. if you store data in /cosma6/ please access it from the cosma6 queue. It will not be available to cosma5 or cosma7 compute nodes.
Yes. These should be stored on the data storage connected to the compute nodes you are using. e.g. for cosma6, use /cosma6/data/
If initial conditions are small and only used by a small number of nodes at a time, they may be better stored in your home space.
Investigate using SLURM arrays
You can generate ssh keys on a COSMA login node, and copy the public part to your .ssh/authorized_keys file. This will then alow you to ssh directly to a compute node. However, unless you have good reasons for doing this, please don't.
SLURM commands such as squeue and sview will tell you this.
If a large (many node) job is waiting to run, it cannot start until enough nodes are available. This therefore means that some nodes become idle, and can be used as "back-fill" for short jobs. To use this, submit your job as usual, and if the system deems it appropriate, it will schedule you quickly. To make best use of this, specify a short time period for your job.
The c[5,6,7,8]backfill command (e.g. c5backfill) will show available nodes for back-filling.
COSMA5 and COSMA6 login nodes have 512GB, while COSMA7 login nodes have 1.5TB memory. COSMA8 login nodes have 2TB RAM. This is shared between other logged in users, so please check for a quiet node first.
The mad01 system has 3TB memory.
The mad02 system has 1.5TB RAM.
The mad03 system has 6TB RAM (using Apache Pass non-volatile memory, so performance is somewhat reduced).
The mad04 system has 4TB RAM.
Standard visualisation tools are available. If you need an answer to this question, please ask us to put more details here. Several GPU servers are also available.
allinear, gdb, perf, etc are all installed.
allinear, gdb, perf, etc are all installed
Specify a short time period in you batch script file.
The srun command will allow you to do this.
To compile in parallel, use the -j NNN argument when you type make (e.g. make -j 4 to use 4 threads during compilation).
Code compiled on COSMA5 or COSMA6 will run on COSMA7 and COSMA8. Heavily optimised code compiled on COSMA7 may not run on COSMA5 or COSMA6, as these have older CPU architectures.
If you begin moving large files about, you may notice poor performance.
With the cosma module loaded ("module load cosma"), use the "quota" command.
To increase your quota, please email cosma-support@durham.ac.uk.
Probably, you are trying to access storage not belonging to that particular COSMA. e.g. from the cosma7 queue, you are trying to write to /cosma5. This is not allowed (inefficient for large jobs). So, you need to make sure that all reading/writing is done to locally attached storage.
The "newgrp" command can be used to do this. If the "id" command shows you being a member of several groups, you can change between them using "newgrp GROUPNAME". This will only take effect for your current shell. However, it can be used within batch scripts if you wish to write your data as a particular group.
First, check that you have write premission on this directory:
ls -ld /path/to/directory
This should show something like drwxr-xr-x 1234 USERNAME GROUP ..., which means that you have read, write and execute permission, that members of the group have read and execute permission, and that everyone else has read and execute permission. If the w for write permission is not present (e.g. dr-xr-xr-x) you should add it: chmod u+w /path/to/directory
Check that the directory is owned by you, i.e. that the USERNAME returned from the ls -ld command is your username!
If that is all okay, check that you have sufficient quota left to write data there: "module load cosma" and then "quota". This should be done on a login node (login5, login6 or login7).
Finally, if you are still not able to write, check that the group you are writing as still has quota left. To identify your group, use the "id" command. This will show gid=XXXX(group) where XXXX is a number. You can then use e.g. c5quota -g XXXX (or c6quota or c7quota). Also, please note that your default group may not be that used to write your data, if your directory has a "sticky" bit set. In this case, the "ls -ld" command will show something like drwxr-sr-x. Note the "s". You will be writing as the specified group into this directory, regardless of what your default group is. Then check that this group is not over quota.
To change the group that you are writing as, if a sticky bit is set, either remove the sticky bit (chmod g-s /path/to/directory) or re-group the directory (chown USER:NEWGROUP /path/to/directory). If a sticky bit is not set, and your default group is over quota, so you wish to write as a different group, you can use the "newgrp" command: "newgrp GROUP". This will change your group to that specified (which should be in the list given by the "id" command), for the current terminal. Therefore, you will need to do this again if you log in again. You will also need to put it into your Slurm batch scripts. Alternatively, if you believe that your default group is wrong, and wish it to be changed, please contact cosma-support.