wcbe_sys

Name

wcbe-sys — WIDAR correlator back end subsystem

Synopsis

ssh wcbe@cbe-master pipelinefs /tmp/wcbe-dev/nodes
/tmp/wcbe-dev/setup 01
bdf_publisher &
config_listener &
executor_listener &
configure and activate: echo [config file path] > /tmp/wcbe-dev/config

DESCRIPTION

The WIDAR correlator back end subsystem comprises a variety of programs running on the nodes of the CBE cluster. This man page is meant to give an overview of the programs in the subsystem, how they are related, and how to use these programs.

Please note that the CBE implements a distributed, multi-process application, and is similar to a server application in several ways. Many of the complexities in controlling the CBE are consequences of this architecture. As experience is gained using the system, higher level interfaces for using the CBE may develop, but, if additional interfaces are developed, they will be built on top of the current architecture and those interfaces that exist now.

For the PTC, the CBE cluster consists of two nodes: cbe-master, the head (or “master”) node; and cbe-node-01, the single compute (or “slave”) node that processes WIDAR lag frames.

The processes in a complete CBE subsytem are the following:

pipelinefs
wcbe
lagset_pipeline
bdf_publisher
executor_listener
config_listener

The following sections provide high level descriptions of the processes in the preceding list, and how they are related.

pipelinefs

The monitor and control interface on the head node for the pipeline processes that run on the compute nodes of the CBE cluster. This interface appears as a file system on the cluster's head node; grep for pipelinefs in /proc/mounts to verify that this filesystem has been mounted, and/or to find its mount point. The file system process must be run by the wcbe user on the head node. To spawn the pipelinefs process on the head node “master” and mount its file system at “path”:

ssh wcbe@master pipelinefs path

For the PTC, the above command is normally executed as

ssh wcbe@cbe-master pipelinefs /tmp/wcbe-dev/nodes

Note that using ssh to spawn a process as the wcbe user requires that your SSH public key be listed in the authorized_keys file of the wcbe user. Once pipelinefs has been started, however, anyone in the cbemgrs group may read and write in the pipelinefs file system.

Detailed “README” files can be found in the file system providing descriptions of the file system entries in each directory. These “README” files are to be considered as reference documentation, not tutorials.

All monitor and control of the pipeline processes is done through pipelinefs; however, for ease of use, several scripts have been written on top of this layer. These scripts are useful in many instances, but they do not encompass all possible usages of the interface.

wcbe

The processes on the compute nodes that implement the input stage of the data processing pipeline for WIDAR lag frames. One of these processes should be spawned on each compute node of the CBE cluster via the pipelinefs interface.

For the PTC, there is a script in the /tmp/wcbe-dev directory for spawning wcbe on cbe-node-01 (or connecting to an existing instance), and initializing a default configuration for that process. The script is named “setup”, and should be used as follows:: /tmp/wcbe-dev/setup 01

The “setup” script should complete within seconds. Things to check if this isn't the case:

pipelinefs is mounted, and a process by that name is running on cbe-master under the wcbe user.
When wcbe has been uncleanly stopped on the compute node, the socket used for communications requires some time to be cleaned up by the OS, and cannot be reused by another process until that time. If you don't want to wait, change the port number appearing in line 11 of the “setup” script, and try again.
A broken wcbe process is already running on the compute node. In this case log in to the compute node and kill any wcbe (and lagset_pipeline) processes. Follow the advice in the previous bullet point about choosing a new socket port number to avoid socket re-use problems.

The default configuration is named c0, and should appear in the /tmp/wcbe-dev/nodes/01/configurations directory. The c0 configuration is sufficient to have the CBE produce BLF files, but not much more.

lagset_pipeline

The processes on the compute nodes that implement the “lag set processing stage” CBE pipelines. These are normally spawned and killed by the wcbe process on the compute nodes as required. No user intervention is typically required.

bdf_publisher

Process that runs on the head node that monitors the writing of BDF files produced by the CBE, finishes them when required, moves them to a “published BDF” directory, and sends a BdfInfo message to MCAF. The script that implements this process uses the WCBE_BDFDIR environment variable to determine where the BDF file processing occurs, and its value must be in accord with the contents of the bdfdir file in the top-level directory of pipelinefs.

For the PTC, WCBE_BDFDIR normally does not need to be set, and a default value of “/home/cbe-master/wcbe/data” is used by bdf_publisher. (This default accords with the value “/home/cbe-master/wcbe/data/.private” written to bdfdir in pipelinefs by the “setup” script described above.) This process may be run by any user in the cbemgrs group on the PTC. There is no harm in allowing this process to run indefinitely.

executor_listener

Process that runs on the head node that listens for “Observation” documents sent to a multicast address by the executor, and initiates the recording of BDF files by the backend (one for each scan). For this process to work properly most of the aforementioned components of the CBE subsystem must be up and running. No changes to the CBE processing pipeline configuration are preformed by the executor_listener process, so the processing pipeline must have been configured (and the desired configuration must have been activated) prior to running executor_listener.

The executor_listener only reacts to “Observation” documents that have a value of “widar” in the “correlator” element. There is likely no harm in allowing this process to run indefinitely, but, since BDF files may be produced in response to executor messages, it is probably prudent to start and stop this process as necessary.

config_listener

Process that provides a simplified interface to configure and activate a pipeline configuration. At present, this process has been specialized to work on the PTC CBE only. By passing to this process the complete path name of a configuration file on cbe-master, a configuration will be created on cbe-node-01, and that configuration will be activated immediately thereafter. The process reads the file name from a named pipe that can be written to by any other process on cbe-master. For the PTC, this named pipe is /tmp/wcbe-dev/config. There is no harm in allowing this process to run indefinitely.

EXAMPLES

Typical usage

The following sequence describes a typical procedure to check on the status of the CBE and configure it.

Check that pipelinefs is running by verifying that “pipelinefs” appears in /proc/mounts:
```
grep pipelinefs /proc/mounts
```
Check that a wcbe instance is running on cbe-node-01 by verifying the existence of the 01 directory under the pipelinefs mount point:
```
ls /tmp/wcbe-dev/nodes/01
```
Check that bdf_publisher, config_listener, and executor_listener are running (by any member of the cbemgrs group).
```
ps aux |grep 'bdf_publisher\|config_listener\|executor_listener'
```

Send a configuration to the compute node:

echo /path/to/config/file > /tmp/wcbe-dev/config

Verify that the configuration was loaded correctly:
```
ls /tmp/wcbe-dev/nodes/01/configurations
```
If successful, there should be a directory with a name based on the approximate current time as [hours]_[minutes]_[seconds].

Verify that the configuration was activated:

readlink /tmp/wcbe-dev/nodes/01/active


The arrival of lag frames drives the CBE processing pipeline, and the CBE does not maintain any notion of external time except through the timestamps that are carried by the lag frames. Consequently, a configuration cannot be activated except when lag frames are being received by the backend node. Therefore, configuration activation is deferred by the compute node until lag frames are received, meaning that the “active” link referred to above may not be present until that time, or it may be linked to a different configuration than the newly created one.

The arrival of lag frames drives the CBE processing pipeline, and the CBE does not maintain any notion of external time except through the timestamps that are carried by the lag frames. Consequently, a configuration cannot be activated except when lag frames are being received by the backend node. Therefore, configuration activation is deferred by the compute node until lag frames are received, meaning that the “active” link referred to above may not be present until that time, or it may be linked to a different configuration than the newly created one.

Check that lag frames are arriving at the compute node at approximately the rate you expect:
```
cat /tmp/wcbe-dev/nodes/01/active/pipelines/INPUT_STAGE/src/properties/actual-rate
```

“Shutting down” the CBE

The following is a typical procedure for putting the CBE into a “quiescent” state after you have finished some tests. Note that this procedure is not necessary; it is simply a way to minimize resource usage be the CBE (yet leave it in a state that supports writing BLF — but not BLS nor BDF — files).

Activate the default c0 configuration:

echo 1 > /tmp/wcbe-dev/nodes/01/activations/now-c0

Look for leftover configurations, and delete them. If you have been using config_listener to configure the backend, there should be no leftover configurations after the preceding step (plus a few seconds). In case there are, simply remove the directories for those configurations:
```
rmdir /tmp/wcbe-dev/nodes/01/configurations/LEFTOVER_CONFIG
```
Optionally, shut down executor_listener. This step is unnecessary if you have activated the c0 configuration, which has no defined lag sets, and is thus unable to write BDF files.

NOTES

Switching configurations

Pipeline configurations created via config_listener (that is, the named pipe config) are initialized so that they are automatically destroyed after another configuration is activated. Thus, switching from one configuration to another using config_listener should take care of cleaning up the previous configuration.

Possible config switching bug

	Possible config switching bug
There may be a bug in switching configurations in the manner described above. If you find that a configuration loaded using `config_listener` doesn't seem to work, try switching to the default c0 configuration as an intermediate step. To switch to c0, do the following: echo 1 > /tmp/wcbe-dev/nodes/01/activations/now-c0

There may be a bug in switching configurations in the manner described above. If you find that a configuration loaded using config_listener doesn't seem to work, try switching to the default c0 configuration as an intermediate step. To switch to c0, do the following:

echo 1 > /tmp/wcbe-dev/nodes/01/activations/now-c0

To delete a configuration manually remove its directory from pipelinefs using rmdir. For example, on the PTC, to remove a configuration named foo:

rmdir /tmp/wcbe-dev/nodes/01/configurations/foo

Restarting wcbe

To restart a wcbe process, remove its directory from pipelinefs using rmdir, and then run the “setup” script again. Killing the compute node processes may take a few seconds. Monitor the existence of the processes on the compute node to ensure that they have shut down before re-running “setup”.

It's also possible to simply kill the wcbe process on a compute node manually; in which case also kill any lagset_pipeline processes on that node as well. Killing the processes with the TERM signal will allow them to shut down cleanly; using the KILL signal may leave behind a directory named according to the pattern /tmp/cbe[0-9]+, which may be deleted manually (although there is no harm to subsequent wcbe processes if the old directory is not deleted).

Restarting pipelinefs

Restarting pipelinefs is normally not required (it is only a communications mechanism to the processes on the compute nodes). Nevertheless, to stop pipelinefs, run the following command

ssh wcbe@cbe-master fusermount -u /tmp/wcbe-dev/nodes

ENVIRONMENT

WCBE_BDFDIR: BDF publication directory. Used by bdf_publisher and executor_listener.

FILES

Files and directories in pipelinefs.
BDF creation and publication directories. If the BDF publication directory were /path/to/bdfs, BDF files should be created under /path/to/bdfs/.private.
setup, a script to simplify starting wcbe on PTC cbe-node-01.
config, a named pipe that is the interface to config_listener. Since this pipe is in /tmp on cbe-master, it may not exist after a reboot; in this case, create a new one using mkfifo(1). The named pipe should have permissions set to 660, and cbemgrs group ownership.
/var/log/wcbe.log: CBE log file. On PTC all CBE log messages generated on cbe-node-01 are also written in the log file on cbe-master. Note that log file rotation occurs in the early morning hours on cbe-master, so you may need to read a new file if you're monitoring a long running process.
/tmp/cbe*: Directories on compute nodes created by wcbe processes. When a wcbe process has been forcibly killed, these directories may be left over; in this case, they should be manually deleted.

AUTHOR

Written by Martin Pokorny, mpokorny at nrao dot edu