Information Technology Division
Computing & Communications Center

Cluster FAQ

General Cluster Information

Compiling On The Cluster

Using the Cluster

Possible Error Messages


General Cluster Information

What is the CCC Compute Cluster?

The CCC Compute Cluster is a group of 6 dual Quad Core 2.7GHz Opteron systems running the Rocks Cluster Distribution. Rocks is based on Red Hat Enterprise Linux 5.1 with a variety of clustering and high-performance computing packages installed on top of it.

The cluster is connected to the outside world through ccc-cluster.wpi.edu, and internally connected via a private gigabit ethernet switch.

Top

What software is available on the cluster?

The following languages and compilers are available:

The following math libraries are available:

The following other libraries and packages are available:

Top

How do I request that new software be installed?

Send email to ccc-hpc@wpi.edu or mtaylor@wpi.edu.

Top

Is there a mailing list for cluster discussion?

There is: ccc-hpc@wpi.edu. Mail mtaylor@wpi.edu to request that you be added to the list.

Top

Compiling On The Cluster

How do I use the gnu fortran and C compilers?

without mpich:

To use the gnu compilers to compile software without the MPI libraries, you can simply run gcc or g77 as you would on any non-clustered system.

with mpich:

There are mpi wrappers for the Gnu compilers that will automatically provide most or all of the flags required to compile software with the MPI libraries.

To compile software with the MPI libraries using the Gnu compilers, use these commands for the compiler instead of plain gcc or g77:
/opt/mpich/gnu/bin/mpicc
/opt/mpich/gnu/bin/mpif77

(Note that there is no mpif90. The gnu fortran compiler does not support Fortran90 code. Only Fortran77 is supported.)

Top

How do I use the Intel fortran and C compilers?

without mpich

To use the Intel compilers to compile software withou the MPI libraries, you can simply run them as you could on any non-clustered system.

The C and C++ compiler is in the /opt/intel_cc_80/bin/ directory, which should already be in your path. The name of the compiler is icc.

The F77 and F90 compiler is in /opt/intel_fc_80/bin/ directory, which should already be in your path. The name of the compiler is ifort. The old style of calling the Intel fortran compiler, ifc, will still work, but it will produce a warning message.

You will need to statically link any programs compiled with the Intel compilers. This means you need to add the -static flag to the flags passed to the compiler. For example:
icc -static program.c -o program

If you do not, you will be unable to run the resulting binary executable on any of the compute nodes.

with mpich

There are mpi wrappers for the Intel compilers that will automatically provide most or all of the flags required to compile software with the MPI libraries.

To compile software with the MPI libraries using the Intel compilers, use these commands for the compiler instead of plain icc or ifort:
/opt/mpich/intel/bin/mpicc
/opt/mpich/intel/bin/mpif77
/opt/mpich/intel/bin/mpif90

For some reason, /opt/mpich/intel/bin/mpiCC does not seem to operate as it should, but since the intel C compiler will build both C and C++ code, it can be used instead.

You will need to statically link any programs compiled with the Intel compilers. This means you need to add the -static flag to the flags passed to the compiler. For example:
mpicc -static program.c -o program

If you do not, you will be unable to run the resulting binary executable on any of the compute nodes.

Top

Using the Cluster

How do I monitor the cluster?

http://ccc-cluster.wpi.edu/ganglia

Top

How do I run a shell command on all the compute nodes?

cluster-fork [command]

For example, to check the load on all compute nodes, you would type this:
cluster-fork uptime

Top

How do I start an mpi job on the cluster?

The mpirun command to start an MPI job on the cluster depends on which compiler was used to build the executable.

For the Gnu compilers:

/opt/mpich/gnu/bin/mpirun

For the Intel compilers:

/opt/mpich/intel/bin/mpirun

Otherwise, the mpirun commands operate in an identical fashion. To start an mpi job on 8 CPUs using code developed using the Intel compilers, you would type something like this:
/opt/mpich/intel/bin/mpirun -np 4 programname

Top

How do I clean up zombie processes on the compute nodes?

The skill command can be used to kill all jobs owned by a given user, or all jobs with a given name. For example, if you have been running a program named "computejob" that has exited incorrectly, and it has left runaway processes on the compute nodes, you can type this to kill it on all compute nodes:
cluster-fork skill -9 computejob

You can also kill all processes owned by yourself by specifying your own username as an argument to skill. For example, I would type this:
cluster-fork skill -9 mtaylor

Top

Possible Error Messages

What does p4_error: alloc_p4_msg failed: 0 mean?

p0_6773: (7.828703) xx_shmalloc: returning NULL; requested 1048616 bytes
p0_6773: (7.828762) p4_shmalloc returning NULL; request = 1048616 bytes
You can increase the amount of memory by setting the environment variable
P4_GLOBMEMSIZE (in bytes); the current size is 3048616
p0_6773: p4_error: alloc_p4_msg failed: 0

The default P4_GLOBMEMSIZE for has been set to the maximum size the amount of memory in the compute nodes will allow, but if you reset it, you may see errors like this.

The P4_GLOBMEMSIZE variable must be set to much larger than the amount of memory the program is requesting. The current default size is 32000000.

This can be reset by typing this:
export P4_GLOBMEMSIZE=32000000 (for bash users)
setenv P4_GLOBMEMSIZE 32000000 (for csh or tcsh users)

Top

What does libcprts.so.5: cannot open shared object file: No such file or directory mean?

/home/mtaylor/tests/test.exe: error while loading shared libraries:
libcprts.so.5: cannot open shared object file: No such file or directory
p0_792: p4_error: Child process exited while making connection to remote
process on compute-0-0.local: 0
/opt/mpich/intel/bin/mpirun: line 1: 792 Broken pipe /home/mtaylor/tests/test.exe -p4pg /home/mtaylor/tests/PI646 -p4wd /home/mtaylor/tests

This means you did not statically link the binary using the -static flag. Compile your programs using the -static flag, like this.

Top

Why do I get lots of errors trying to compile C++ programs using the Intel compilers?

If you get lots and lots of errors while trying to compile C++ code with the Intel mpiCC, when the code compiles properly with the Gnu mpiCC, you should use the intel mpicc instead. It will compile both C and C++ code, and appears to work properly.

Top

What does p4_error: semget failed for setnum: 0 mean?

p4_error: semget failed for setnum: 0

This means that the maximum number of allowed semaphores on the master node has been created, and the program you are trying to run cannot allocate a new semaphore for inter-process communication. This can happen when somebody has been testing software that does not exit properly, leaving semaphores and shared memory segments allocated.

If the leftover semaphores are owned by you, it can be fixed by running the following two commands:
/opt/mpich/gnu/sbin/cleanipcs
cluster-fork /opt/mpich/gnu/sbin/cleanipcs

(In this case, using the intel or gnu version doesn't matter. The scripts are identical.)

It is possible that other users may have filled up the semaphore table. In this case, either they or root will need to clean the tables. Please mail mtaylor@wpi.edu mentioning the error you are seeing and I will clean it up.

Top

Maintained by itweb
Last modified: Jul 23, 2009, 15:24 EDT
[WPI] [CCC] [Back] [Top]