Supplied by header.js

Correlator Power Control Computer (CPCC)

[Not Complete -- Still In Progress. KJRyan.2022.03.03]

The CPCCs control and monitor all aspects of the physical correlator room - cooling, power to the correlator boards, power plant monitoring, fire suppression system monitoring, ambient room weather, correlator board temperatures and rack fan speeds. They are responsible for safely, expediently and automatically powering off the correlator during emergencies.

Powering all the boards off at once may cause damage. The boards can draw over 2000 Amps of electrical current; stopping this flow of current suddenly could cause damaging voltage spikes. When CPCCs power the system down (and also when powering up), it is done in stages 16 boards at a time.

CPCC's are considered to be 'mission critical' and as such, the software was designed to be simple and robust, with fail-safe'ness in mind. Two CPCCs run in tandem, neither is 'master' or 'primary' (though their names are 'cpcc1' & 'cpcc2', the '1' and '2' are simply a means to distinquish them).

cpcc1 is powered from the WIDAR power plant -48VDC battery through an inverter, cpcc2 is powered from the Control Building UPS. In the event of catasrophic conditions in the WIDAR room, it is the intent that at least one of the CPCCs will still be running in order to safely shut down the correlator.

The CPCCs monitor themselves. At system startup time a Watchdog Thread is started that oversees all the critical monitoring Threads running in the system.

The CPCCs monitor each other. The two CPCCs constantly chat with each other over what is called a 'MirrorLink'. If a GUI sends a command to one CPCC, that CPCC will mirror the commend to the other so both their perceived system states should always match. Periodically one CPCC will send to the other its perceived system state and the two are compared.

At least one CPCC must be running at all times.
The CPCCs are what keep the boards powered up and at least one must be up and running to keep the boards powered up. If both CPCCs are shut off, all Correlator Boards will be powered off at the same instant.

The CPCCs have two physical interfaces to the equipment in the WIDAR Correlator Room. An Ethernet connection to the NRAO network is used to gather monitor information from CMIBs mounted on the WIDAR Correlator Boards. A direct-wired connection to each of the 16 correlator racks is used to control power to the individual correlator boards. The RPMIB interface is described in greater detail here.

CPCC System Functional Description

The main jobs of the CPCC are 1) Fire Detection and Suppression System Monitoring, 2) A/C Power monitoring, 3) Board temperature monitoring, 4) Rack fan speed control and 5) Safely staged powering and up and down of the correlator boards as well as staged putting the corrlator boards into and out of Low Power Mode.

Fire Detection and Suppression System Monitoring

For a much more detailed description of the WIDAR Fire Detection and Suppression System please see: Correlator Room Fire Detection and Suppression

The WIDAR room 3 stages of fire alarming are wired into the RPMIB in Rack S002. The CPCCs monitor these 3 signals as follows:

A/C Power Monitoring

The CPCC's monitor the WIDAR room power plant via SNMP messages over the Ethernet network. When the power plant senses a loss of AC power (from both SEC and the generator) it sends an 'On Battery' message to the CPCCs. When this happens, CPCCs behave in one of two ways depending on the Low Power state that the system is in as follows:

When AC Power Goes Out and the System is ...

What happens if - after the power goes out - the operator changes Low Power Mode state?
If the system is 'On Battery' and the operator EITHER puts the system into Low Power Mode OR takes the system out of Low Power Mode, CPCCs will kill the timers and base shutdown on monitored battery voltage. Rather than try to revert back and forth between timers and monitored voltage it is was decided that, if the operator changed Low Power Mode state either way, they have situational awareness and CPCCs will leave the decision to the human whether or not to manually shut down the system before the critical low-voltage point happens (at which time CPCCs will shut the system down).

Background Info Regarding the Timers

Brent Carlson (WIDAR Lead Designer), Bob Broilo (former VLA Electrician) and Kevin Ryan (CPCC Engineer) spent hours with stop watches and voltage monitors to figure out the safest, most reliable way to protect the Correlator Boards in the event of sustained loss of AC Power Mains. While the battery voltage monitoring method will keep the correlator running for the longest period of time while still protecting the power plant battery, it will fail if the network is not working because CPCCs will not be able to monitor power plant battery voltage. In the case of catastrophic failure within the room, the network cannot be relied on.

Timers within the CPCC do not rely on communications with the power plant.

Board Temperature Monitoring

Room Temperature Monitoring

Rack Fans Monitor and Control

HVAC Monitor and Control

Board Power Up/Down and Low Power Mode

CPCC Physical Description

something

SuperLogics SL-4U-SBC-H61-BA 4U Rack Mount Server.

The CPCC's are SuperLogics 4U Rack Mount Servers with 16 PCI slots that populated with 16 National Instruments P

Rack Power Module Interface Board (RPMIB)

The RPMIB interfaces the CPCC computers to the 16 correlator racks. It allows the CPCCs to control and monitor power to each individual board, control and monitor the rack fans, control and monitor power to the four HVAC units in the room and monitor the rooms smoke detection system. There is one RPMIB per rack, each with two 100-pin SCSI I/O connectors, one each for cpcc1 and cpcc2.

RPMIBs are located on the backside of the rack just under the circuit breaker panel.

RPMIB Components.

RPMIB Connections

The top row of labels in this illustration coincide with he column labels in the connection table that follows.

Sixteen National Instruments PCI-6509 Data Acquistion (DAQ) cards reside in the 18-slot PCI backplane of each of the two CPCC machines. The IO pins of the DAQ are connected via 100-pin SCSI cable to the RPMIB in each of the sixteen correlator racks.

The Pin # columns shows the pin number of the SCSI connector attached to the CPCCs. Each pin from the connector is routed via circuit board traces to the RPMIB screw terminal blocks where physical wires are connected to the final destinations (board-slots, HVACs and smoke alarm).

Table detailing the RPMIB connections.

The table depicts in detail the connections from the CPCCs (for control and monitor) to the rack board slots, rack fans, HVAC control and monitor lines and the smoke alarm monitor lines.

SCSI Connector

SCSI Connector.

This diagram shows the mapping of SCSI pin number to NI 6509's Port/Pin number.

Rack ID Jumpers

ID Jumper Block.

The WIDAR Correlator consists of sixteen racks; eight Station Board Racks and eight Baseline Board Racks. Each rack has a unique ID: s001 - s008 and b101 - b108 for the station and baseline racks respectively. These IDs are hardwired on the rack's RPMIB, via a jumper block, that shows up as a register that can be read by the CPCC computers.
The jumper pairs are labeled B0 - B5. An open pair is logic high and a jumpered pair is logic low. B5 indicates the rack type: high (open) = Baseline Rack, low (jumpered) = Station Rack. B0 - B4 indicate rack number. In this photo the value from MSB to LSB is 100111 which indicates rack B107.

RPMIB Connection to Fire Detection and Supression System

As shown in the following the diagram, the RPMIB in rack S002 is used to interface the room's fire detection and suppression system to the CPCCs. Each of the three smoke alarm stages are wired to relays in the red control box next to the North door of the correlator room. The relays are connected to spare monitor inputs on the RPMIB. These inputs are held TTL HIGH by 220-ohm pullup resistors. When a smoke alarm stage activates its relay is energized and pulls the monitor bit TTL LOW.

The pullup voltage comes from a control output on this same RPMIB namely the 6U-9 bit. During software start-up it is imperative that this bit be initialized to a TTL Level HIGH BEFORE the smoke detector monitoring software is started; otherwise, the system will think it is in Stage 3 alarm and commence shutting down the corellator boards.

Three relays (in the alarm box outside the north door), one for each level of smoke detection, are wired to the RPMIB in Rack S002. Though deceiving In this picture, the single red line is connected to pin 6U-9. This is the TTL pullup line for the three monitor points at SPR0, SPR1 and SPR2 for smoke alarm stages 1, 2 and 3 respectively..

RPMIB Connection to the HVAC System

As shown in the following the diagram, the RPMIB in rack S001 is used to interface the four HVAC units in the correlator room. Four control lines provide ON/OFF control of each HVAC and four monitor lines provide alarm status. As with the smoke alarm signal, the monitor bits are pulled up to a TTL HIGH by 220 pullup resistors that are held high by voltage fomr the 6U-9 bit.

The four red wires (w/resistors) on the right screw terminals are the alarm monitor input lines, the single wire to the left is the 6U-9 TTL Pullup line and the four wires at the far left provide ON/OFF control to each of the four HVACs.

Monitor Lines:

  • SPR0 (In-26) - DX1
  • SPR1 (In-27) - DX2
  • SPR2 (In-28) - CW1
  • SPR3 (In-29( - CW2

Control Lines:

  • SPR0 (Out-28) - DX1
  • SPR1 (Out-29) - DX2
  • SPR2 (Out-30) - CW1
  • SPR3 (Out-31) - CW2

RPMIB Simplified Schematic

RPMIB Failure Modes and Effects Analysis

The following describes how the RPMIB can fail, the resulting symptoms and how to troubleshoot it. This information was taken from the design document EVLA Correlator Power Monitor and Control System.

Insert your error message here, if the PDF cannot be displayed.

National Instruments PCI-6509 DAQ Boards

The National Instruments PCI-6509 DAQ Boards interface the CPCC software to the RPMIBs. There are 16 boards in each of the two CPCC computers. Two NI boards (one from cpcc1 and one from cpcc2) connect to each of the RPMIBs in the 16 correlaor racks.

National Instruments PCI-6509 DAQ Board

100-pin SCSI connector.

16 National Instruments PCI-6509 are shown in position in a CPCC. The boards occupy the left-most eight slots and the right-most eight slots. Looking from the front of the chassis, the left most slot is device 0 to the kernal driver and corresponds to rack S001. The numbers increment to the right with the last one being driver device 15 corresponding to rack B108.

The 32 SCSI cables connected to the 32 NI Boards in the CPCCs (cpcc1 on top, cpcc2 on bottom).

SCSI Connector

SCSI Connector

Software

One of the foremost requirements of CPCC software is that it be kept simple in order that it will be stable and robust. The software was designed in a modular fashion where each major function is self contained in its own class. These functions are started up in a particular sequence by main method in CpccMain.

One of the most important safeguards towards a robust software system is proper exception handling. While it is beneficial from a debugging standpoint to have as specific exception handling as possible it is much more important to have broad exception handling to catch unforseen runtime exceptions. Failure to do this caused us to fail an annual smoke alarm test when the SmokeDetector failed to catch an unforseen NullPointerException and the smoke detector monitor thread went into never-never land.
The remedy for this is follow all specific exception handling in critical Threads with a generic catch Exception or even catch Throwable handler to catch the unforseen exceptions.

NI6509 Device Driver

The Linux kernel driver for the National Instruments PCI-6509 Data Acquisition (DAQ) cards is a custom driver written by NRAO. It is provides a REST based interface

[description]