CMP grid computing
General information
CMP computation grid is certain number (currently 9) of high performance computers (2.8GHz P4 with 1GB memory) called nodes interconnected with 1GBit ethernet network and running special version of the Linux operating system Open Mosix and also running software Sun Grid Engine (SGE). CMP grid is part of the CMP Unix network, but his nodes has several uniquenesses.
Picture shows topology and also shows, that some network disks are not seen on CMP grid (as. /data, /experiment, /scratch ). The reason of this is high network load produced by grid. An user is forced to transfer his data to network drive /datagrid , which is visible in whole CMP unix network (and as \\ptak\datagrid also under MS windows network). /datatgrid is directly connected to CMP grid, its RAID 5 drive, so data on /datagrid are secure against disk failure, but data on /datagrid are not backup-ed or archived. For creating of directory on /datagrid for your project mail to vecerka@cmp.felk.cvut.cz, for quick work is intended directory /datagrid/temporary . Don't use home directory for storing of data for grid computing, always use /datagrid insteed.
There are only small number of the software installed on CMP grid, like matlab, gcc compilers, ImageMagick, geomview, povray. The style of the work with CMP grid should be:
- Preparing of batch jobs for the grid outside the grid.
- Logging to the grid and running jobs here.
- Transferring results from grid and logout.
OpenMosix and SGE are providing efficient usage of the CMP grid by several ways:
- Automatic migration of the processes from a node with high load to a node with low load, this is done by OpenMosix. Migrate can only processes, which don't use shared memory (application that can migrate, application that don't migrate ). A user don't need to do anything for this, it's done completely automatically, but sometimes it can fail (process don't migrate) or is not so efficiently (matlab6 has problem with efficiency).
- Running and scheduling processes on nodes with minimum load, this is done by Sun Grid Engine. This requires some users action, a user have to use special command for submitting job, but it's provided that job is definitely run on a node with a minimum load. This way is also more efficiently than migration of processes.
- Parallel computing is another possibility how to use the CMP grid. It's the most complicate way from point of view of a user, but it's only way, when for an application is the CMP grid looks like one computer and and application can use the resources from different nodes in one time. There are several installed software in the CMP grid for parallel computing: Parallel Virtual Machine - PVM, Message Passing Interface and Chameleon - MPICH and Local Area Multi-computing - LAM. User have to do some programming to use parallel computing.
Running and scheduling batch jobs
It's good to start with the user documentation of the SGE where is the description what SGE is and how it use it. In the CMP grid is submit and master host computer called ptak. Execution hosts are cmpgrid-0x where x is now 1-8.
Quick start:
Running interactive jobs
- Login to ptak.
- Run command qrsh command eg.: qrsh matlab6 -nojvm.
The command isn't executed on ptak but on one of the cmpgrid* machines.
Applications requires X-window can not be run by this way, so qrsh matlab6 doesn't work (it works but without windows).
- qmon (running on ptak, requires X-window) shows in Job Control->Running jobs on which node is application running.


Running and scheduling non-interactive batch jobs
- Prepare your batch job(s). Always use full-path file specification.
- Login to ptak.
- Run command qsub batch eg.: qsub mogrify.sh.
The batch will be submitted to the queue and will be run on one of the cmpgrid* machines.
It's not possible to submit binary application. Always is necessary to write script enveloping binary command and submit this script.
- qmon (running on ptak, requires X-window) shows in Job Control->Running jobs on which node is application running or in Job Control->Pending jobsshows jobs pending in queue.
An example of the submit script:
#!/bin/sh
/usr/local/bin/matlab6 -nosplash -nojvm < /full_path/my_mfile.m > /full_path/output.txt
Automatic migration of the processes
The good starting point for OpenMosix is openMosixWiki. Every open mosix user command has man page (try man mosrun , man mosmon or man migrate )
For running jobs in the CMP grid isn't necessary to use any special commands. It's possible to login to any cmgrid-* or ptak machine directly and run commands here. But it's important to know, that process running here can automatically migrate to another node of the grid in depending on the current load of the nodes.
Good to know:
- How to find out that process migrates: Look on picture:

Process matlab executed on current node use 99.9% of the processor, but current node has only load 0.00. It implicates, that process matlab migrate to another node.
- Command mosmon shows current load of the all nodes:

- Command nomig process runs a process with a node-lock, process win not migrate to another node (same as runhome process ).
- More flexible command is mostun, that run a process with particular node-allocation preferences (see man mosrun ).
- Nice gui interface for OpenMosix is openmosixview (only on ptak). This picture shows graphically migration of one process:

- Matlab 5 migrates fine, but Matlab 6 has some problems, but migrates only when it's running by command matlab6_migrate. But better option is to use SGE for matlab 6.
comments and suggestions to
Daniel Vecerka