CPCC Physical Description
something

SuperLogics SL-4U-SBC-H61-BA 4U Rack Mount Server.
The CPCC's are SuperLogics 4U Rack Mount Servers with 16 PCI slots that populated with 16 National Instruments P
[Not Complete -- Still In Progress. KJRyan.2022.03.03]
The CPCCs control and monitor all aspects of the physical correlator room - cooling, power to the correlator boards, power plant monitoring, fire suppression system monitoring, ambient room weather, correlator board temperatures and rack fan speeds. They are responsible for safely, expediently and automatically powering off the correlator during emergencies.
Powering all the boards off at once may cause damage. The boards can draw over 2000 Amps of electrical current; stopping this flow of current suddenly could cause damaging voltage spikes. When CPCCs power the system down (and also when powering up), it is done in stages 16 boards at a time.
CPCC's are considered to be 'mission critical' and as such, the software was designed to be simple and robust, with fail-safe'ness in mind. Two CPCCs run in tandem, neither is 'master' or 'primary' (though their names are 'cpcc1' & 'cpcc2', the '1' and '2' are simply a means to distinquish them).
cpcc1 is powered from the WIDAR power plant -48VDC battery through an inverter, cpcc2 is powered from the Control Building UPS. In the event of catasrophic conditions in the WIDAR room, it is the intent that at least one of the CPCCs will still be running in order to safely shut down the correlator.
The CPCCs monitor themselves. At system startup time a Watchdog Thread is started that oversees all the critical monitoring Threads running in the system.
The CPCCs monitor each other. The two CPCCs constantly chat with each other over what is called a 'MirrorLink'. If a GUI sends a command to one CPCC, that CPCC will mirror the commend to the other so both their perceived system states should always match. Periodically one CPCC will send to the other its perceived system state and the two are compared.
At least one CPCC must be running at all times.
The CPCCs are what keep the boards powered up and
at least one must be up and running to keep the boards powered up. If both CPCCs are shut off, all
Correlator Boards will be powered off at the same instant.
The CPCCs have two physical interfaces to the equipment in the WIDAR Correlator Room. An Ethernet connection to the NRAO network is used to gather monitor information from CMIBs mounted on the WIDAR Correlator Boards. A direct-wired connection to each of the 16 correlator racks is used to control power to the individual correlator boards. The RPMIB interface is described in greater detail here.
The main jobs of the CPCC are 1) Fire Detection and Suppression System Monitoring, 2) A/C Power monitoring, 3) Board temperature monitoring, 4) Rack fan speed control and 5) Safely staged powering and up and down of the correlator boards as well as staged putting the corrlator boards into and out of Low Power Mode.
For a much more detailed description of the WIDAR Fire Detection and Suppression System please see: Correlator Room Fire Detection and Suppression
The WIDAR room 3 stages of fire alarming are wired into the RPMIB in Rack S002. The CPCCs monitor these 3 signals as follows:
This alarm is triggered for such things as when the system is placed into maintenance mode or by a loss of pressure in the FM-200 suppressant tanks or the deluge valve (sprinkler) system. It is also triggered when the AnaLASER II High Sensitivity Smoke Detector detects smoke. This could be from a belt slipping in one of the HVAC units. CPCC's indicate this alarm on the Operator's GUI and take no further action.
It should be noted that the Operator's remote annunciator that is a physical part of the fire alarm system takes precedence over the alarms appearing on the Operator's GUI.
This alarm is triggered when one of the many smoke detectors detect smoke. CPCCs indicate it on the Operator's GUI and no further action is taken.
This alarm is triggered when 2 or more detectors detect smoke. At this time, the fire alarm system, among other things, starts a 60-second timer that, when expires, the room shunt trip is activated; this immediately removes all power from the room. CPCCs use that 60-seconds to safely stage power off to the boards in the correlator. It is believed that the boards can survive the FM-200 gas as well as even water from the sprinkler system if they are powered off when the event occurs.
The CPCC's monitor the WIDAR room power plant via SNMP messages over the Ethernet network. When the power plant senses a loss of AC power (from both SEC and the generator) it sends an 'On Battery' message to the CPCCs. When this happens, CPCCs behave in one of two ways depending on the Low Power state that the system is in as follows:
When AC Power Goes Out and the System is ...
CPCCs start two timers; 5-minutes and 15-minutes.
CPCCs put the system into Low Power mode to prolong battery life. This drops the power from a normal operating load of 1600 - 2000+ Amps down to 1060 Amps.
CPCCs begin staged power down of the correlator boards.
The CPCCs will allow the system to stay up for as long as it is safe for the battery. It does this by monitoring power plant battery voltage. When the voltage drops to -43V or below, the CPCCs will begin staged power down of the correlator boards. It is believed that the system can remain powered up, from a fully charged battery, while in Low Power Mode, for as long as 90-minutes based on voltage curves from previous outages.
Battery voltage based shutdown was added when the site had to go through a series of planned power outages lasting longer than the timers. It was desired that the system could ride through these outages without being powered down.
What happens if - after the power goes out - the operator changes Low Power Mode state?
If the system is 'On Battery' and the operator EITHER puts the system into Low Power Mode OR
takes the system out of Low Power Mode, CPCCs will kill the timers and base shutdown on monitored
battery voltage. Rather than try to revert back and forth between timers and monitored voltage
it is was decided that, if the operator changed Low Power Mode state either way, they have
situational awareness and CPCCs will leave the decision to the human whether or not to manually
shut down the system before the critical low-voltage point happens (at which time CPCCs will
shut the system down).
Brent Carlson (WIDAR Lead Designer), Bob Broilo (former VLA Electrician) and Kevin Ryan (CPCC Engineer) spent hours with stop watches and voltage monitors to figure out the safest, most reliable way to protect the Correlator Boards in the event of sustained loss of AC Power Mains. While the battery voltage monitoring method will keep the correlator running for the longest period of time while still protecting the power plant battery, it will fail if the network is not working because CPCCs will not be able to monitor power plant battery voltage. In the case of catastrophic failure within the room, the network cannot be relied on.
Timers within the CPCC do not rely on communications with the power plant.
something
SuperLogics SL-4U-SBC-H61-BA 4U Rack Mount Server.
The CPCC's are SuperLogics 4U Rack Mount Servers with 16 PCI slots that populated with 16 National Instruments P
The RPMIB interfaces the CPCC computers to the 16 correlator racks. It allows the CPCCs to control and monitor power to each individual board, control and monitor the rack fans, control and monitor power to the four HVAC units in the room and monitor the rooms smoke detection system. There is one RPMIB per rack, each with two 100-pin SCSI I/O connectors, one each for cpcc1 and cpcc2.
RPMIBs are located on the backside of the rack just under the circuit breaker panel.
RPMIB Components.
The top row of labels in this illustration coincide with he column labels in the connection table that follows.
Sixteen National Instruments PCI-6509 Data Acquistion (DAQ) cards reside in the 18-slot PCI backplane of each of the two CPCC machines. The IO pins of the DAQ are connected via 100-pin SCSI cable to the RPMIB in each of the sixteen correlator racks.
The Pin # columns shows the pin number of the SCSI connector attached to the CPCCs. Each pin from the connector is routed via circuit board traces to the RPMIB screw terminal blocks where physical wires are connected to the final destinations (board-slots, HVACs and smoke alarm).
Table detailing the RPMIB connections.
The table depicts in detail the connections from the CPCCs (for control and monitor) to the rack board slots, rack fans, HVAC control and monitor lines and the smoke alarm monitor lines.
SCSI Connector.
This diagram shows the mapping of SCSI pin number to NI 6509's Port/Pin number.
ID Jumper Block.
The WIDAR Correlator consists of sixteen racks; eight Station Board Racks and eight Baseline Board Racks.
Each rack has a unique ID: s001 - s008 and b101 - b108 for the station and baseline racks respectively.
These IDs are hardwired on the rack's RPMIB, via a jumper block, that shows up as a register that can be
read by the CPCC computers.
The jumper pairs are labeled B0 - B5. An open pair is logic high and a jumpered pair is logic low. B5
indicates the rack type: high (open) = Baseline Rack, low (jumpered) = Station Rack. B0 - B4 indicate rack
number. In this photo the value from MSB to LSB is 100111 which indicates rack B107.
As shown in the following the diagram, the RPMIB in rack S002 is used to interface the room's fire detection and suppression system to the CPCCs. Each of the three smoke alarm stages are wired to relays in the red control box next to the North door of the correlator room. The relays are connected to spare monitor inputs on the RPMIB. These inputs are held TTL HIGH by 220-ohm pullup resistors. When a smoke alarm stage activates its relay is energized and pulls the monitor bit TTL LOW.
The pullup voltage comes from a control output on this same RPMIB namely the 6U-9 bit. During software start-up it is imperative that this bit be initialized to a TTL Level HIGH BEFORE the smoke detector monitoring software is started; otherwise, the system will think it is in Stage 3 alarm and commence shutting down the corellator boards.
Three relays (in the alarm box outside the north door), one for each level of smoke detection, are wired to the RPMIB in Rack S002. Though deceiving In this picture, the single red line is connected to pin 6U-9. This is the TTL pullup line for the three monitor points at SPR0, SPR1 and SPR2 for smoke alarm stages 1, 2 and 3 respectively..
As shown in the following the diagram, the RPMIB in rack S001 is used to interface the four HVAC units in the correlator room. Four control lines provide ON/OFF control of each HVAC and four monitor lines provide alarm status. As with the smoke alarm signal, the monitor bits are pulled up to a TTL HIGH by 220 pullup resistors that are held high by voltage fomr the 6U-9 bit.
Monitor Lines:
Control Lines:
The following describes how the RPMIB can fail, the resulting symptoms and how to troubleshoot it. This information was taken from the design document EVLA Correlator Power Monitor and Control System.
The National Instruments PCI-6509 DAQ Boards interface the CPCC software to the RPMIBs. There are 16 boards in each of the two CPCC computers. Two NI boards (one from cpcc1 and one from cpcc2) connect to each of the RPMIBs in the 16 correlaor racks.
National Instruments PCI-6509 DAQ Board
100-pin SCSI connector.
16 National Instruments PCI-6509 are shown in position in a CPCC. The boards occupy the left-most eight slots and the right-most eight slots. Looking from the front of the chassis, the left most slot is device 0 to the kernal driver and corresponds to rack S001. The numbers increment to the right with the last one being driver device 15 corresponding to rack B108.
The 32 SCSI cables connected to the 32 NI Boards in the CPCCs (cpcc1 on top, cpcc2 on bottom).
SCSI Connector
The CPCC system
The EVLA Correlator fire detection and suppression system consists of three separate and independently functional systems:
In addition, the two CPCC computers are wired into the Fike FM-200 panel to monitor status of the smoke detectors and, in the event of fire, power down the correlator in a timely manner to minimize damage from fire suppressents.
There are three stages of fire/smoke detection:
One of the foremost requirements of CPCC software is that it be kept simple in order that it will be stable
and robust. The software was designed in a modular fashion where each major function is self contained in
its own class. These functions are started up in a particular sequence by main
method in
CpccMain
.
One of the most important safeguards towards a robust software system is proper exception handling.
While it is beneficial from a debugging standpoint to have as specific exception handling as possible
it is much
more important to have broad exception handling to catch unforseen runtime exceptions. Failure to do
this caused us to fail an annual smoke alarm test when the SmokeDetector failed to catch an unforseen
NullPointerException and the smoke detector monitor thread went into never-never land.
The remedy for this is follow all specific exception handling in critical Threads with a generic
catch Exception
or even catch Throwable
handler to catch the unforseen
exceptions.
Where the main()
method resides and where the program is started from.
[describe CpccMain]
Provides the REST interface to the CPCC system.
[description]
Monitors the power plant for loss of AC power mains and prepares to power down system after a set amount of time without power or when the battery voltage goes below a set threshold.
[description]
Provides the REST interface to the CPCC system.
[description]
Provides the REST interface to the CPCC system.
[description]
Provides the REST interface to the CPCC system.
[description]
Provides the REST interface to the CPCC system.
[description]
Monitors CPCC Software
[description]
Finally the watchdog Thread is started:
cpcc1 -- Watchdog Errors Noted:
MirrorLink Problem: CPCCs are chatting but have issues with
their mirroring:
cpcc1:
The Linux kernel driver for the National Instruments PCI-6509 Data Acquisition (DAQ) cards is a custom driver written by NRAO. It is provides a REST based interface
[description]