Calibrater Performance Improvements Thus Far -------------------------------------------- George Moellenbrock 2003 Aug 27 A lot of improvements have been made in calibrater performance since 2003 July and more is coming. Throughout this process of improvement, performance has been monitored using a 2 hour VLA simulation (27 antennas, 1 spectral window with 1 channel, 10 second integrations, full polarization = 252720 (RR,RL,LR,LL) visibilities). The simulated observation is of a point source with only ordinary gain errors (which change abruptly every 30 minutes, a la a source change, but which are constant during the intervals), and noise (snr=8 per visibility). Recently, channelized datasets (8, 64, 128) with the *same* net sensitivity have been added to the work. This has revealed some new and interesting components of the performance picture. The calibration trials involve obtaining a series of solutions (1,6,11,23,45,90,360, or 720) by appropriately sub-dividing the dataset (using appropriate solution interval settings). This dataset is unrealistic in some ways, but it is a solid basis upon which to explore performance improvements and comparisons with other packages. More-realistic systematic errors (including other effects) and lower snr will be attempted in the near future. The improvements are (see plot 1): 1. minimized slot counting (bookkeeping of solutions) 2. Improved convergence criterion 3. Predict from previous solution. 4. Remove log messaging and an extra chi2 calculation 5. more conservative convergence criterion (handles real data better) (this was a step backwards in performance) 6. patch memory leak 7. avoid unnecessary gain matrix inversion 8. initial calibration store optimization A fair amount of the work (1,7,8) has involved recognizing different operational contexts in the calibrater tool, e.g., solving vs. applying calibration, such that processing methods they would otherwise have in common are specialized appropriately to the different contexts. A few bugs have been fixed (4,6), and the convergence criterion and solution prediction have been massaged (2,3,5). Plot 2 shows the comparsion of aips++ and aips for 1 and 8 channel datasets. At the low slot count end (left), aips++ is ~3X slower for 1 channel (3 sec vs. 1 sec), and ~2X slower for 8 channels (6 sec vs 2 sec). As plot 3 shows, the slope of the aips++ curve is dominated by the calibration table write step. This is due to row-wise (not column-wise) I/O, and we believe we know how to fix this. Plot 4 shows that with increasing channel number, aips++ becomes more competitive with aips. This is most likely due to the tiled disk I/O used in aips++. However, the slope also increases, although the solve process itself (and the write step) is not a function of number of channels (the channel data are averaged before the solve). Work is continuing on performance improvements.... Plot 5 shows that introducing the trivial model assumption to the I/O step reduces the execution time dramatically. This is the first optimization (other than the log messages excision) which substantially reduces the y-intercept in the performance curves. Previously, aips++ has been reading the model from the MODEL_DATA column (on disk), which is unnecessary when the model is trivial (a point source). This issue is complicated somewhat when a priori calibrations are considered. Some thought is currently going into how to do the appropriate logic to make sure that a priori calibration, data normalization, and frequency and time averaging are done in the optimal order while maintaining the proper algebraic order of the calibration terms (some don't commute conveniently with others, and with the normalization and averaging steps). The trivial model assumption simplifies this logic in most cases. Plot 6 shows that (for 128 channel data), the solve is dominated by I/O (data-only in this plot, not model), pre-solve time and frequency averaging (which will reduce by a factor of ~2 when the trivial model assumption is introduced to them), and by unnecessary in-memory data copying. This last item in the solve step is responsible for the increasing slope(nchan) of the aips++ curves. The fundamental solve components themselves are very small. Summary 1. For larger numbers of solutions, the current stable calibrater is dramatically faster than that of ~2 months ago. 2. Performance related enhancements include a significant measure of operational context- and mode-dependent specializations, as well as some genuine errors in the earlier code. So far, the fundamental generality of the calibrater's solver has not been compromised, so these performance improvements are realized in other solve contexts (e.g., B, D). 3. Several outstanding performance issues are currently being worked, including: optimizing the cal table write, a full implementation of the trivial model assumption, and cleaning out unnecessary in-memory data copies in the solve. These should be done in the next few weeks. 4. I will be discussing many of these issues with Eric, including how the aips curves are so flat, and how aips does the normalization/averaging prior to the solution. Plots Fig 1 - sim_27ant_2h_10s_1ch_allcals http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig1.pdf Fig 2 - comp1and8 http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig2.pdf Fig 3 - calsolvetimecomp http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig3.pdf Fig 4 - comp8and64and128 http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig4.pdf Fig 5 - comp8and64and128_triv http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig5.pdf Fig 6 - solvecomp2 http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig6.pdf