CMP grid computing


General information

CMP computation grid is certain number (currently 9) of high performance computers (2.8GHz P4 with 1GB memory) called nodes interconnected with 1GBit ethernet network and running special version of the Linux operating system Open Mosix and also running software Sun Grid Engine (SGE). CMP grid is part of the CMP Unix network, but his nodes has several uniquenesses.

Picture shows topology and also shows, that some network disks are not seen on CMP grid (as. /data, /experiment, /scratch ). The reason of this is high network load produced by grid. An user is forced to transfer his data to network drive /datagrid , which is visible in whole CMP unix network (and as \\ptak\datagrid also under MS windows network). /datatgrid is directly connected to CMP grid, its RAID 5 drive, so data on /datagrid are secure against disk failure, but data on /datagrid are not backup-ed or archived. For creating of directory on /datagrid for your project mail to vecerka@cmp.felk.cvut.cz, for quick work is intended directory /datagrid/temporary . Don't use home directory for storing of data for grid computing, always use /datagrid insteed.

There are only small number of the software installed on CMP grid, like matlab, gcc compilers, ImageMagick, geomview, povray. The style of the work with CMP grid should be:

OpenMosix and SGE are providing efficient usage of the CMP grid by several ways:

Running and scheduling batch jobs

It's good to start with the user documentation of the SGE where is the description what SGE is and how it use it. In the CMP grid is submit and master host computer called ptak. Execution hosts are cmpgrid-0x where x is now 1-8.

Quick start:

Running interactive jobs

  1. Login to ptak.
  2. Run command qrsh command eg.: qrsh matlab6 -nojvm.
    The command isn't executed on ptak but on one of the cmpgrid* machines.
    Applications requires X-window can not be run by this way, so qrsh matlab6 doesn't work (it works but without windows).
  3. qmon (running on ptak, requires X-window) shows in Job Control->Running jobs on which node is application running.

Running and scheduling non-interactive batch jobs

  1. Prepare your batch job(s). Always use full-path file specification.
  2. Login to ptak.
  3. Run command qsub batch eg.: qsub mogrify.sh.
    The batch will be submitted to the queue and will be run on one of the cmpgrid* machines.
    It's not possible to submit binary application. Always is necessary to write script enveloping binary command and submit this script.
  4. qmon (running on ptak, requires X-window) shows in Job Control->Running jobs on which node is application running or in Job Control->Pending jobsshows jobs pending in queue.
An example of the submit script: #!/bin/sh /usr/local/bin/matlab6 -nosplash -nojvm < /full_path/my_mfile.m > /full_path/output.txt

Automatic migration of the processes

The good starting point for OpenMosix is openMosixWiki. Every open mosix user command has man page (try man mosrun , man mosmon or man migrate )
For running jobs in the CMP grid isn't necessary to use any special commands. It's possible to login to any cmgrid-* or ptak machine directly and run commands here. But it's important to know, that process running here can automatically migrate to another node of the grid in depending on the current load of the nodes.

Good to know:


comments and suggestions to Daniel Vecerka