Computational Cluster Tiger

Wimpy Utilities

mpirun [-N [NODE_ID]] [-n NICE] [LONG_OPTIONS] program arg1 arg2 ...

Start program (with arguments arg1, arg2,...) in the cluster environment. The command mpirun exits immediately after it supplies all information needed to start the job to the wimps server.

OPTIONS:

-N force Wimpy to start and keep the process on the specific node. If NODE_ID is not supplied, new job is started on the predefined node with most memory. See list of available nodes with brief description of their specific hardware parameters. Do not send jobs to nodes without explicitely specified special features in the list.
-n NICE: start the program with given value of nice. If not specified, default value NICE=12 is used.

LONG OPTIONS:

--stdin=FILE redirect standard input to regular file named FILE. Don't use pipes or sockets! Only regular files in the /home directory are accessible accross the whole cluster and can be safely used. If this option is not used, standard input is redirected to /dev/null.

--stdout[=FILE] redirect standard output to regular file. If FILE is not specified, name 'PID.stdout' in the current working directory is used. If FILE ends with slash, name 'FILE/PID.stdout' is used. If the option is omitted, standard output is redirected to /dev/null.

--stderr[=FILE] redirect standard error output to regular file. If FILE is not specified, name 'PID.stderr' is used. If FILE ends with slash, name 'FILENAME/PID.stderr' is used. If the option is omitted, standard error output is redirected to /dev/null.

mpikill [-s SIGNAL] PID

send signal to running job with given PID. Wimpy daemons forward the signal across the cluster. In the case of fine signal handling, users should keep in mind that the process is signalled by root process (Wimpy daemon).

SIGNALs (only numeric values are accepted):

1   SIGHUP

9   SIGKILL

10   SIGUSR1

12   SIGUSR2

15   SIGTERM - default value

18   SIGCONT - resume suspended process. Signal is handled by Wimpy daemons and is not delivered to the process

19   SIGSTOP - suspend running process. Signal is handled by Wimpy daemons and is not delivered to the process

mpistop PID

alias to mpikill -s 19 PID . Suspend (checkpoint & kill) process to an image file. It can be used e.g. to give more room to other processes in the case of heavy cluster overload.

mpiresume PID

alias to mpikill -s 18 PID . Resume suspended process from the image file.

mpirenice -n NICE PID

Change nice of process determined by PID. New value of nice must be larger than the old one. Users can only change nice of their own processes.

mpilist [-acdmqsSt]

Print brief cluster statistics and list of running jobs. By default, the jobs are listed with fields specifying PID of the process, effective USER id, value of NICE, "number" of the NODE where the process is currently running, STATus, used MEMory, consumed (normalized) CPU TIME and name of the executable (COMMAND). Possible states are:

R   running job

S   sleeping job (i.e. not consuming CPU time)

CH   job suspended by the user to an image file

CH+   job suspended by the system (probably due to the PID conflict)

EXIT   exitted job (it may stay in the table for some time, but definitely less than one minute)

LOST   lost job (e.g. due to the lost communication with the node). This status should never occur under normal circumstances; the process can be usually recovered from the last checkpoint

OPTIONS:

-a show full IP address of the nodes running the jobs (by default, only last byte of the address is shown)

-c show full command lines

-d show difference of consumed CPUtime with respect to value targeted by the scheduler

-m show number of process migrations

-q don't print header

-s don't print cluster statistics

-S show per user usage rather than full list of nodes

-t show the consumed CPUtime in seconds (default format is minutes:seconds)

Last updated: 20.12.2014 (L.)