wcbe-sys — WIDAR correlator back end subsystem
ssh wcbe@cbe-master pipelinefs /tmp/wcbe-dev/nodes
/tmp/wcbe-dev/setup 01
bdf_publisher &
config_listener &
executor_listener &
echo [config file path] > /tmp/wcbe-dev/config
The WIDAR correlator back end subsystem comprises a variety of programs running on the nodes of the CBE cluster. This man page is meant to give an overview of the programs in the subsystem, how they are related, and how to use these programs.
Please note that the CBE implements a distributed, multi-process application, and is similar to a server application in several ways. Many of the complexities in controlling the CBE are consequences of this architecture. As experience is gained using the system, higher level interfaces for using the CBE may develop, but, if additional interfaces are developed, they will be built on top of the current architecture and those interfaces that exist now.
For the PTC, the CBE cluster consists of two nodes: cbe-master
, the
head (or “master”) node; and cbe-node-01
, the single compute (or
“slave”) node that processes WIDAR lag frames.
The processes in a complete CBE subsytem are the following:
The following sections provide high level descriptions of the processes in the preceding list, and how they are related.
The monitor and control interface on the head node for the pipeline
processes that run on the compute nodes of the CBE cluster. This
interface appears as a file system on the cluster's head node; grep
for pipelinefs
in /proc/mounts
to verify that this filesystem has
been mounted, and/or to find its mount point. The file system process
must be run by the wcbe
user on the head node. To spawn the
pipelinefs
process on the head node “master” and mount its file
system at “path”:
ssh wcbe@master pipelinefs path
For the PTC, the above command is normally executed as
ssh wcbe@cbe-master pipelinefs /tmp/wcbe-dev/nodes
Note that using ssh
to spawn a process as the wcbe
user requires
that your SSH public key be listed in the authorized_keys
file of
the wcbe
user. Once pipelinefs
has been started, however, anyone
in the cbemgrs
group may read and write in the pipelinefs
file
system.
Detailed “README” files can be found in the file system providing descriptions of the file system entries in each directory. These “README” files are to be considered as reference documentation, not tutorials.
All monitor and control of the pipeline processes is done through
pipelinefs
; however, for ease of use, several scripts have been
written on top of this layer. These scripts are useful in many
instances, but they do not encompass all possible usages of the
interface.
The processes on the compute nodes that implement the input stage of
the data processing pipeline for WIDAR lag frames. One of these
processes should be spawned on each compute node of the CBE cluster
via the pipelinefs
interface.
For the PTC, there is a script in the /tmp/wcbe-dev
directory for
spawning wcbe
on cbe-node-01
(or connecting to an existing
instance), and initializing a default configuration for that process.
The script is named “setup”, and should be used as follows::
/tmp/wcbe-dev/setup 01
The “setup” script should complete within seconds. Things to check if this isn't the case:
pipelinefs
is mounted, and a process by that name is running on
cbe-master
under the wcbe
user.
wcbe
has been uncleanly stopped on the compute node, the
socket used for communications requires some time to be cleaned up
by the OS, and cannot be reused by another process until that time.
If you don't want to wait, change the port number appearing in line
11 of the “setup” script, and try again.
wcbe
process is already running on the compute node. In
this case log in to the compute node and kill any wcbe
(and
lagset_pipeline
) processes. Follow the advice in the previous
bullet point about choosing a new socket port number to avoid socket
re-use problems.
The default configuration is named c0
, and should appear in the
/tmp/wcbe-dev/nodes/01/configurations
directory. The c0
configuration is sufficient to have the CBE produce BLF files, but not
much more.
The processes on the compute nodes that implement the “lag set
processing stage” CBE pipelines. These are normally spawned and
killed by the wcbe
process on the compute nodes as required. No user
intervention is typically required.
Process that runs on the head node that monitors the writing of BDF
files produced by the CBE, finishes them when required, moves them to
a “published BDF” directory, and sends a BdfInfo
message to MCAF.
The script that implements this process uses the WCBE_BDFDIR
environment variable to determine where the BDF file processing
occurs, and its value must be in accord with the contents of the
bdfdir
file in the top-level directory of pipelinefs
.
For the PTC, WCBE_BDFDIR
normally does not need to be set, and a
default value of “/home/cbe-master/wcbe/data” is used by
bdf_publisher
. (This default accords with the value
“/home/cbe-master/wcbe/data/.private” written to bdfdir
in
pipelinefs
by the “setup” script described above.) This process
may be run by any user in the cbemgrs
group on the PTC. There is no
harm in allowing this process to run indefinitely.
Process that runs on the head node that listens for “Observation”
documents sent to a multicast address by the executor, and initiates
the recording of BDF files by the backend (one for each scan). For
this process to work properly most of the aforementioned components of
the CBE subsystem must be up and running. No changes to the CBE
processing pipeline configuration are preformed by the
executor_listener
process, so the processing pipeline must have been
configured (and the desired configuration must have been activated)
prior to running executor_listener
.
The executor_listener
only reacts to “Observation” documents that
have a value of “widar” in the “correlator” element. There is
likely no harm in allowing this process to run indefinitely, but,
since BDF files may be produced in response to executor messages, it
is probably prudent to start and stop this process as necessary.
Process that provides a simplified interface to configure and activate
a pipeline configuration. At present, this process has been
specialized to work on the PTC CBE only. By passing to this process
the complete path name of a configuration file on cbe-master
, a
configuration will be created on cbe-node-01
, and that configuration
will be activated immediately thereafter. The process reads the file
name from a named pipe that can be written to by any other process on
cbe-master
. For the PTC, this named pipe is /tmp/wcbe-dev/config
.
There is no harm in allowing this process to run indefinitely.
The following sequence describes a typical procedure to check on the status of the CBE and configure it.
Check that pipelinefs
is running by verifying that “pipelinefs”
appears in /proc/mounts
:
grep pipelinefs /proc/mounts
Check that a wcbe
instance is running on cbe-node-01
by
verifying the existence of the 01 directory under the pipelinefs
mount point:
ls /tmp/wcbe-dev/nodes/01
Check that bdf_publisher
, config_listener
, and
executor_listener
are running (by any member of the cbemgrs
group).
ps aux |grep 'bdf_publisher\|config_listener\|executor_listener'
Send a configuration to the compute node:
echo /path/to/config/file > /tmp/wcbe-dev/config
Verify that the configuration was loaded correctly:
ls /tmp/wcbe-dev/nodes/01/configurations
If successful, there should be a directory with a name based on the approximate current time as [hours]_[minutes]_[seconds].
Verify that the configuration was activated:
readlink /tmp/wcbe-dev/nodes/01/active
The arrival of lag frames drives the CBE processing pipeline, and the CBE does not maintain any notion of external time except through the timestamps that are carried by the lag frames. Consequently, a configuration cannot be activated except when lag frames are being received by the backend node. Therefore, configuration activation is deferred by the compute node until lag frames are received, meaning that the “active” link referred to above may not be present until that time, or it may be linked to a different configuration than the newly created one. |
Check that lag frames are arriving at the compute node at approximately the rate you expect:
cat /tmp/wcbe-dev/nodes/01/active/pipelines/INPUT_STAGE/src/properties/actual-rate
The following is a typical procedure for putting the CBE into a “quiescent” state after you have finished some tests. Note that this procedure is not necessary; it is simply a way to minimize resource usage be the CBE (yet leave it in a state that supports writing BLF — but not BLS nor BDF — files).
Activate the default c0 configuration:
echo 1 > /tmp/wcbe-dev/nodes/01/activations/now-c0
Look for leftover configurations, and delete them. If you have been
using config_listener
to configure the backend, there should be no
leftover configurations after the preceding step (plus a few seconds).
In case there are, simply remove the directories for those
configurations:
rmdir /tmp/wcbe-dev/nodes/01/configurations/LEFTOVER_CONFIG
executor_listener
. This step is unnecessary
if you have activated the c0 configuration, which has no defined lag
sets, and is thus unable to write BDF files.
Pipeline configurations created via config_listener
(that is, the
named pipe config
) are initialized so that they are automatically
destroyed after another configuration is activated. Thus, switching
from one configuration to another using config_listener
should take
care of cleaning up the previous configuration.
Possible config switching bug | |
---|---|
There may be a bug in switching configurations in the manner described
above. If you find that a configuration loaded using echo 1 > /tmp/wcbe-dev/nodes/01/activations/now-c0 |
To delete a configuration manually remove its directory from
pipelinefs
using rmdir
. For example, on the PTC, to remove a
configuration named foo
:
rmdir /tmp/wcbe-dev/nodes/01/configurations/foo
To restart a wcbe
process, remove its directory from pipelinefs
using rmdir
, and then run the “setup” script again. Killing the
compute node processes may take a few seconds. Monitor the
existence of the processes on the compute node to ensure that they
have shut down before re-running “setup”.
It's also possible to simply kill the wcbe
process on a compute node
manually; in which case also kill any lagset_pipeline
processes on
that node as well. Killing the processes with the TERM signal will
allow them to shut down cleanly; using the KILL signal may leave
behind a directory named according to the pattern /tmp/cbe[0-9]+
,
which may be deleted manually (although there is no harm to subsequent
wcbe
processes if the old directory is not deleted).
pipelinefs
.
/path/to/bdfs
, BDF files should be created under
/path/to/bdfs/.private
.
setup
, a script to simplify starting wcbe
on PTC cbe-node-01
.
config
, a named pipe that is the interface to config_listener
.
Since this pipe is in /tmp
on cbe-master
, it may not exist after
a reboot; in this case, create a new one using mkfifo(1). The named
pipe should have permissions set to 660, and cbemgrs
group
ownership.
cbe-node-01
are also written in the log file on
cbe-master
. Note that log file rotation occurs in the early
morning hours on cbe-master
, so you may need to read a new file if
you're monitoring a long running process.
wcbe
processes.
When a wcbe
process has been forcibly killed, these directories
may be left over; in this case, they should be manually deleted.