CONSOLIDATED REPORT MONITOR AND CONTROL SOFTWARE PDR 14 - 15 May 2002 Gustaaf van Moorsel 7/31/2002 CONTENTS ======== OVERALL DESIGN ISSUES MIB SELECTION/RFI CONSIDERATIONS M&C SOFTWARE / MIBs OPERATIONAL INTERFACE DEVICE BROWSER CORRELATOR MONITOR AND CONTROL CORRELATOR BACKEND/INTERFACE WITH ARCHIVE RECRUITMENT OVERALL DESIGN ISSUES ===================== Question -------- There is a distinct lack of an architectural design. Producing such a design should be one of the highest priorities in the months ahead. Without a design that shows how everything fits together a schedule cannot be created. The missing design means a high element of risk. The schedule is very tight; we may be forced into using existing software, such as the GBT M&C system. Reply ----- The EVLA M&C software group strongly agrees with this point. We are working now to develop an overall architectural design. Real-time software does not always have the luxury of following the classic software development scenarios. Real-time software is part of a combined software and hardware development effort, and must respond both to the overall needs of the project, and to hardware development schedules. The current lack of an architectural design is a direct result of the decision to respond to the need to configure the systems software and the development environment for the antenna MIBs in support of the schedule for the test antenna. Those MIB issues have now been settled, and we have now turned our attention to architectural issues. Question -------- The schedule is being driven by the hardware needs, not the software design. The software schedules don't mean much until a software design is in place. Reply ----- We are of the opinion that real-time software must sometimes depart from the classic development scenarios. The EVLA does not become a reality until we have an outfitted test antenna and can determine the RFI environment. It is the test antenna schedule that is driving the development schedule, and rightly so. We agree that software schedules have very little meaning until a software design has been developed. However, it is our opinion that responding to the test antenna schedule as the 1st priority was the correct course to follow for the case of the EVLA M&C software effort. Question -------- An object diagram is needed of the design as understood, or being thought about, so far. Its most important use is before the design is finished. It will guide one's thoughts into better object-oriented designs. It should exist whether a review is scheduled or not. Reply ----- Agreed. A 1st cut of a high level sketch of the objects in the core of the system will appear on or about 7/8/2002. It is not particularly detailed, but goes in the direction of capturing overall structure and will serve as the basis for further elaboration. That same document will contain the beginnings of our thinking on a standard device interface. This document is very informal, and will appear on evla-sw-discuss. Question -------- For the overall architecture, we need a software equivalent of the block-diagram that exists for hardware Reply ----- Some people insist on diagram rich architecture and design documents because "nobody reads". We agree with this sentiment. We will try to insure that our current efforts to develop an architecture includes informative diagrams. MIB SELECTION/RFI CONSIDERATIONS ================================ Question -------- What drove us to the particular choice of chip? What drove us to the choice of RTOS? Reply ----- The chip choice was driven almost entirely by RFI considerations: On-chip RAM is an important guideline for the reduction of RFI. The choice of RTOS was made primarily because of its small footprint, fitting comfortably in the available RAM. Question -------- Are we backing us too much in a corner by certain requirements to the chip? Is an RFI-free design using off-chip RAM out of the question? Reply ----- The only way we will know if the MIB chip requirements were unnecessarily stringent is to build the MIB prototype board, and to then compare its RFI levels to both the detrimental levels of RFI as developed from theory and tests, and to the RFI levels of COTS boards that might have been used for the MIB. Both steps will be taken. We are attempting to mitigate the severity of the current constraints by relying on standards for MIB communications with the external world. As long as standard method for MIB communication are used, such as Ethernet as the wire protocol and UDP & TCP/IP as the underpinning for information exchange, we should be able to preserve our options, and have a path for changes/upgrades to the antenna MIB that will not impact the rest of the system. An RFI-free design using off-chip RAM seems unlikely. The ability to use off-chip RAM would have greatly expanded our options for the MIB, and would have reduced costs, but in the absence of actual test data (soon to be developed), we believe the MIB choice was the right one. Question -------- The selection of the antenna MIB is driven by RFI considerations and the choice of Ethernet. While RFI concerns must be addressed successfully (or we will be swamped with noise) the question is whether the complexities of Ethernet are really worth it. Having only one chip on the market that meets all requirements should raise red flags. What if that chip is withdrawn from the market - actually it really is not on the market yet? Reply ----- We are not overly concerned with the fact that only one suitable chip was located at the time of the search. It seems clear that the System-on-Chip (SOC) market is in a state of dynamic expansion, and that the sort of SOC we are using for the MIB will become much more common in the near future. At least two other chips, well along in development were located, but they were a few months further out from production than the TC11IB. More chips will follow. At least for the next several years, the sort of SOC chip we need for the antenna MIB will become increasingly commonplace. As to Ethernet, we will risk the prediction that 5 to 7 years from now, the extensive use of COTS networking will been seen as one of the most powerful features of the EVLA. Putting the necessary network infrastructure in place is expensive, time-consuming, and requires attention to a myriad of details. However, once in place, it allows the use of readily available, volume-priced hardware for maintenance and upgrades, makes possible the use of the very wide range of commercial and open source software technologies/packages that are Ethernet/IP based, and will give the EVLA an unprecedented degree of flexibility, which is absolutely essential to satisfying the longer-term requirements of inter-operation with VLBA antennas, operation of the NM Array antennas, and satisfaction of the requirements for remote observing and observing modes that are more interactive. Question -------- What if it turns out that the chip has to be changed? Reply ----- As long as communications between the chip and the external world are based on standard wire & software protocols we have alternatives. If need be the MIB _systems software_ could be ported to a new chip and the applications software would not then require modification. Another alternative is to use different systems software, which includes the same basic functionality. Then, a port, but not a rewrite of the MIB applications software would be required. Question -------- Should we look into other MIB/software combinations in case the risks for this MIB look bad? Reply ---- To develop alternative chip/systems software scenarios at this time would probably not be a good idea, with emphasis on "at this time". Investigation of alternative scenarios takes time, money, and manpower. Since we are terribly short on time and manpower, we would rather stall the development of alternative scenarios until the case for the possible need of them is much clearer and stronger. The time and manpower not currently spent on the MIB would best be spent on the development of applications software for the current MIB, and on the overall EVLA M&C software architecture. M&C SOFTWARE / MIBs =================== Question -------- Whenever programs have the same or similar interfaces, every attempt should be made to use object-oriented methods to capture those interfaces and, whenever possible, make them identical. This has been the single, biggest win in the GBT software. The EVLA project seems to have an excellent start on such a strategy, but a careful review would bring out others; also, beware of the trap of something being "too simple" or "not needing" a standard interface which greatly diminishes the utility of those interfaces which are standard. A good example is George Peck's engineering "high-level screens" will interface to the device interface described by Kevin Ryan which is the same interface used by the M&C system. It is not clear how the software development responsibilities are being divided between software and hardware personnel. Whoever does it should work toward developing standard interfaces between the device software per se and the software labeled I/O Area, Device Functionality, and Other Low-Level Device Specific Code. Those things that the Correlator/Backend provide or need from M&C, should be done like other devices, i.e., use the "device" interface. Reply ----- As of June 2002 we began to work on an overall design. One of the 1st issues raised was that of a standard device interface. We are investigating that issue now, with the goal of specifying a standard interface that will function at all levels of the software. Question -------- The GBT project lost a significant amount of development time due to lack of resolution on a number of issues involving requirements. They seemed minor, but when attempting to make design decisions, vague or missing requirements slows design down significantly. I wish we had had a clear mechanism for resolving such issues. It was not clear to me that such a mechanism exists for the EVLA, e.g., as whether you need an integrated or global reset for the MIB. Reply ----- I believe we do have reasonably clear lines of responsibility drawn - Rick Perley as Project Scientist, Jim Jackson as Systems Engineer for Hardware, Barry Clark as Systems Engineer for Software, Peggy Perley as Head of Operations. For the example given, poll these parties for their opinions. If opinions differ, put them all in a room together and lock the door until a consensus is reached. Question -------- We need clearly defined, standardized MIB screens for standardized hardware to reduce development time. This requires input from the hardware engineers. Has this been given consideration? Reply ----- Hardware engineers have been and will continued to be solicited for their input. Sometimes it is necessary to wrestle them to the ground to get anything more than the obvious "I need access to the hardware". We are practicing our holds and throws. We are also thinking about and experimenting with ideas that will allow us to speed up the development of the initial, lowest level screens that will be needed at the start of bench testing. Question -------- Has re-using some of the VLA Software been considered? Reply ----- Elements of the VLA design have already found their way into our thinking. Reuse of the VLA code is, for the most part, not practical. The VLA system is not object-oriented, does not include the notion of intelligence at the antennas, does not have to deal with different antenna types, and is written almost entirely in assembler and Fortran. Modcomp assembler code is entirely non-portable. Fortran IV and Fortran 77 are not candidate languages for the EVLA software. Question -------- Has contractor work been considered? Reply ----- We have already contracted for the the port of the systems software for the antenna MIB, and we are constantly on the lookout for other tasks suitable for contract work. However, for the core applications, we strongly prefer to use in-house personnel in order to keep the expertise and knowledge of the software within NRAO. Question -------- It was stated that test and operational software will be written by the hardware designer. That seems realistic, but the implementation needs definition: who writes what; interfacing requirements with system; standards, languages, other details. Reply ----- The Computer Division will supply a "skeleton" for the MIB software that will include methods for getting data out of the MIB and commands into it. We expect that Wayne Koski and George Peck will want to handle the MIB device interfaces - SPI, parallel I/O lines, etc, and that the designers of specific hardware will want to write the code for that hardware. However, we are flexible and will remain so. As development proceeds, the task list of what needs to be done will grow more detailed. Allocation of the work from that task list will proceed as the task list grows. Question -------- Do we have any idea about the reliability of the antenna MIBs? Reply ----- We don't have the means for accelerated life testing at NRAO, but we have plans to purchase thermal analysis software. Question -------- Whichever Communication Protocol is selected, it should be "Discovery Based" to an extent that monitor point data can be logged/archived based solely on the information in the system, i.e., logging programs are completely generic. Reply ----- Agreed, and that is where our thinking has gone. Question -------- The differences and requirements for detecting, reporting, and signaling bad values (data or monitoring) was not clear. Where are messages, indicators, and/or flags used? And how to handle alarm/message cascading (information overload)? Reply ----- These points were not clear because they have not yet been defined. I don't expect that we will get to this point until Sept - Oct 2002. Question -------- How will power failures on the arms be addressed? Power to the arms fails often during summer months, albeit for short periods. Currently, someone has to go out to a failed antenna when this happens. The new system should provide remote power reset. Reply ----- Discussions about providing a global antenna reset have taken place, and Wayne Koski has been urged to give serious consideration to this feature. We also feel that a discovery based device interface will be of considerable usefulness w.r.t. power outages. Assuming that crucial portions of the system are on battery backup, a discovery based device interface will discover when portions of the array have disappeared and will adapt to that fact, and will also discover the reappearance of the hardware when power has been restored, and will adapt to those new circumstances. Question -------- Is it possible to interrogate an antenna, i.e. "are you out there?" Reply ----- Control will flow downward in the system. Monitor data will flow upward (and perhaps laterally). There will be a constant flow of monitor data that will serve to keep us informed as to who is out there. Question -------- The MIB is needed by the fall 2002 for RFI testing and module development. Is this realistic? Reply ----- Yes. While we will not meet the date of 7/15/2002 for a MIB prototype board, we should have one in time for RFI testing in the fall of 2002. Question -------- Who is doing the MIB software? Advises to use resources in both Electronics and the Computer Division. Reply ----- The MIB software will be done by Wayne Koski, George Peck, the actual device designers, Kevin Ryan, and effort will also be contributed by the person hired to replace Bruce Rowen in his previous capacity as one of the people who helps to maintain the VLBA. Question -------- It is highly recommended that a more detailed schedule/scenario is put together. For instance, for the bench integration, what do we need at the various phases in terms of M&C support? Reply ----- We agree that a more detailed schedule/scenario is needed. It is a high priority item. Question -------- A security plan is required. Will this be given attention? Reply ----- It indeed is necessary to develop such a plan. As of May 2002, the four highest priority items for the EVLA M&C software effort are: 1. Development of an overall software architecture and design. 2. Development of a detailed, timelined scenario/schedule for the test antenna. 3. Security requirements and a design that satisfies the requirements for security. 4. Development of a more detailed, timelined scenario/schedule for the hybrid array (the transition plan). plus, of course, the actual hardware and applications software development for the antenna MIB. We will deploy our manpower with these priorities in mind. OPERATIONAL INTERFACE ===================== Question -------- Is the operational interface only supported on one main platform? Reply ----- No, all displays will work on all platforms. We feel that there is no reason to unnecessarily limit the software, especially the operational software, to a single platform. We need to build a system that is highly adaptable and flexible and free of such limitations. And with programming languages such as Java there is absolutely no reason to build to a single platform. Question -------- How etched in stone is the level of access for the various groups? Reply ----- It is not cast in stone. The diagram in the presentation is simply a first cut at a diagram that shows a top-level view of the security requirements. It is meant to show the primary user categories and what types of permissions those users will have from different locations. It is my understanding that an EVLA security document will be generated in the near future. The diagram will be modified as those requirements become better known. Question -------- What about the overhead for XML, SOAP, etc? Reply ----- There is, without question, overhead associated with the use of XML or SOAP as a communication protocol namely that the data is sent over the wire as ASCII text which typically means that packet/stream sizes will be larger. There is also the cost of serializing and deserializing the data as the received packets/streams must be parsed using an XML parser. We do feel, however, that due to the strong industry backing and acceptance of these technologies, we should not disregard them without giving them a serious look. These technologies might not be the end-all solution to all parts of the system, but we do believe they will play a role in selected parts of the system. Question -------- I take it that the EVLA -- like the VLA -- does not worry about data monitoring during a scan, whereas it was agreed that basic viewing of the astronomical data during a scan is imperative for the GBT. Reply ----- Actually ,the VLA does support data monitoring during a scan. The function of the F/D10 display, which is in constant use at the VLA during an observation, is to provide the VLA Operators with a measure of the quality of the science data during the scan. Additionally, there is the checker screen which displays alarms and warning messages generated from monitor data, and a third screen, with an associated software process, that warns of conditions such as a potential array stall due to missing files, failure of antennas to converge to a solution during a reference pointing scan, and other conditions. The EVLA will include similar capabilities, but of an enhanced nature. We do worry about data monitoring during a scan, and the capacity to monitor data quality is very much a part of the plans for the EVLA. Question -------- If one is serious about following the design outlined by Kevin Ryan in his talk, i.e., using distributive processing - "Putting the Intelligence where is Action is" (which I strongly recommend since that was the guiding principle for the GBT), then the computer(s) on the antenna, MIB or otherwise, should have enough power and memory to accept only high-level commands and on the whole act autonomously. Reply ----- The TC11IB has a Tricore core which is clocked at 96 MHZ. In addition it has a peripheral control processor which appears to be clocked at 48 MHZ, and 1.5 MB of on-chip RAM of which approximately 1 MB will be available for application code. If there were only one of these processors available for each antenna, there would be legitimate cause for concern that there was insufficient computing power and resources to place a sufficiently high level of intelligence in the antenna to implement the desired software architecture. However, there will be 40 to 50 MIBs, each with its own TC11IB chip in each antenna. Taken together, we do not see this amount of distributed processing power as insufficient to implement an approach that consists of sending high level commands to the MIB, with the actual implementation of those commands performed by the MIB. DEVICE BROWSER ============== Question -------- For the VLBA, Bob Greschke and the Electronics Division have written important test software. Moeser's Device Browser nicely does some of what VLCj does for the VLBA. It is not clear who will write the test macros to tie together multiple screens and equipment for things like PCAL, BBC, RFI tests. Does the requirement document cover features provided by VLCj? Does it cover features provided by tests sets written by Mack Stephenson for the VLBA? Reply ----- It is our understanding that the test software will be written by those individuals that know how to test the hardware, namely the hardware engineers. The test software should be written to a standardized software interface that allows the test software to be plugged-into the system. We are not familiar enough with VLCj to comment on whether or not the features in VLCj are in the operational requirements document. If anyone is familiar with VLCj and has read the EVLA operational requirements document and finds that requirements are missing, they should bring it to the attention of Bill Sahr or Rich Moeser. Question -------- Does such a browser mean overhead on the MIB side? Reply ----- No, these data are already in the MIB Question -------- What about the number of packets/second needed? Reply ----- This is indeed a big issue, we will have to take efficiency seriously. CORRELATOR MONITOR & CONTROL ============================ Question -------- Is the virtual correlator interface a device? Reply ----- Yes, and it should follow the standard interface rules Question -------- What about PCMCIA (vs. PC104+)? Reply ----- PCMCIA is newer, smaller, and it is where the industry is going Question -------- Why separate the various control computers? Reply ----- This greatly enhances reliability. For instance, when the Correlator Power Control Computer (CPCC) goes down, the correlator does not go down with it. Question -------- Is there a need for the CPCC to talk to the antenna devices? Reply ----- No, all this goes through the Main Correlator Control Computer (MCCC.) CORRELATOR BACKEND/INTERFACE WITH ARCHIVE ========================================= Question -------- Does the e2e only expect frequency domain data? Reply ----- In general the Backend (BE) can output data in any form that the e2e can accept. A major design point of the BE was to perform FFT's of lags to spectra. There are currently no requirements to produce other than spectra. Question -------- Why is Ethernet needed between the correlator and the correlator backend? Reply ----- To minimize interprocessor communications on the BE, all work for a given Baseline will be done on a single BE node. Depending on the Correlator mode, Baseline data can come from a number of Correlator output points and can vary from mode to mode, thus there is no fixed mapping of correlator output points to Baselines. As a result we will need maximum flexibility in the connection scheme between the Correlator and the Backend. Currently, a switched, Gigabit ethernet provides the optimal combination of flexibility, speed and cost. Question -------- Can the reversibility requirement be loosened? Reply ----- Irreversible processes will not be hidden from the user. All irreversible processes will be under user control. All reversible processes (e.g., FFT) will produce sufficient metadata to allow the process to be undone at some future time. Question -------- Constant phase rotation needs to be added to the data processing requirements Reply ----- This will be taken into consideration for addition as an optional process. Question -------- Is a heterogeneous cluster a possibility, allowing us to replace components one at a time? Reply ----- The advantage of having a homogeneous cluster is that we simply assign the same number of baselines to each node to distribute the workload evenly. For a heterogeneous cluster we will have to take into account the relative speeds of the nodes when determining how many baselines each will do. Question -------- What about alarms? Are we going to use the same screen for monitor and for control data? Reply ----- The Backend will not have a GUI of its own and hence will not need to directly produce user screens. The BE functional design will be coordinated with M&C to provide the needed alarms and user data in an agreed upon format via an agreed upon delivery mechanism. Question -------- Please explain the plans re flags. Is there a large flag with each visibility, or a combined flag. How will the e2e system deal with those flags? Reply ----- The specifics of data flagging of output to the e2e has not yet been worked-out. This will depend in part on requirements imposed by the e2e. Question -------- Could you comment on networking? Reply ----- The correlator and the antennas will constitute two networks, each of which is large than that at the AOC. It is important to use identical switches all around, and make use of network management software. Question -------- Why use segments at all? Reply ----- Lag Data will be distributed to the various BE processes running on a given node via shared memory. (No off-node distribution is anticipated until final delivery to the e2e.) This shared memory cache space will be logically segmented or blocked into a few to possibly ten large chunks. These will be filled one at a time by the Input process as data comes in from the Correlator. They will be accessed by the Input Manager to do a logical sort of the lag frames, and emptied by the Data Processing process which applies math functions. Empty blocks are released back to Input for reuse. The size and number of blocks will be set (possibly dynamically) to optimize throughput. A key factor will be minimization of the number of Input interrupts at the end of filling a block (pushing towards having a minimal number of blocks). A competing factor is the need to avoid having other processes waiting for Input to finish filling a block (pushing toward more blocks). Question -------- Will the prototype of the backend have any functionality? Reply ----- The BE prototype is intended to have full internal functionality. It will be able to accept Correlator lag frames (which are synthetically generated and stored on disk since there will be no actual correlator available.) It will be capable of doing FFT's and integration but will most likely not have any optional functions deployed. It will probably not produce output formatted for the e2e in its first incarnation. The underlying message passing layer used may not be the same one ultimately used in the production code. Question -------- How do the correlator data and the M&C data relate in the archive? Reply ----- The intent is to have the Backend combine all data not directly received from the Correlator with the lag frame data to produce a single output stream to the e2e. All Correlator non-lag frame data and all non-Correlator data will come to the BE from M&C and be combined before being sent to the archive. Thus, there should be no need to further relate (assemble) data in the archive itself. Question -------- How do monitor data attach to a measurement set that is still open? Reply ----- This question is still mostly an open issue. It is likely that monitor data will be attached to a measurement set as an extension or extensions to the AIPS++ measurement set format. Having said that much, no further details can be supplied at this time. RECRUITMENT =========== Question -------- Establishing a close relationship with Tech, UNM, and NMSU has helped the Electronics program. There have been student hires that show great potential, and visibility with alumni has helped with two hires. Hiring all seasoned veterans is a great goal, but in view of the location and salary it is sometimes necessary to hire someone with promise. Reply ----- For all advertised position we need staff with at least several years of work experience and demonstrated abilities in the fields required. We think student hires are too risky and would need too much time to come up to speed, even if eventually successful. We have, though, considered candidates with only a few years experience who would require additional training and seasoned veterans alike. Question -------- There has been talk of an Albuquerque office. And what about Charlottesville as a location? What about employing help from other parts of the observatory, especially GB? Reply ----- Creation of an Albuquerque office has been considered, but we don't feel availability of such an office would have made a difference in hires that fell through. As for hiring people at other NRAO sites, for obvious reasons we would prefer not to have some of our staff thousands of miles away, but if that's what it takes to hire suitable staff, we may consider it. As we proceed with the initial architectural design we hope to open robust, strong communication channels with other parts of the the observatory, especially Green Bank. The possibility of inviting various Green Bank personnel to Socorro for periods of work on the EVLA software architecture and design has been discussed, and will be pursued. Question -------- We need to continue to push on recruiting software engineers. We need not to be afraid to 'experiment' with one of the open positions. Reply ----- We are pushing. At the time of writing (July 2002) we have filled three of our four vacancies, and have a number of suitable candidates for the fourth. We are steadily becoming more flexible in our view of candidates, and in the terms of our offers. We are also aided by the cooling off of the job market.