Calibrater Performance Improvements Thus Far
--------------------------------------------

George Moellenbrock
2003 Aug 27

A lot of improvements have been made in calibrater performance since 2003
July and more is coming.

Throughout this process of improvement, performance has been monitored
using a 2 hour VLA simulation (27 antennas, 1 spectral window with 1
channel, 10 second integrations, full polarization = 252720 (RR,RL,LR,LL)
visibilities).  The simulated observation is of a point source with only
ordinary gain errors (which change abruptly every 30 minutes, a la a
source change, but which are constant during the intervals), and noise
(snr=8 per visibility).  Recently, channelized datasets (8, 64, 128)
with the *same* net sensitivity have been added to the work.  This has
revealed some new and interesting components of the performance picture.

The calibration trials involve obtaining a series of solutions
(1,6,11,23,45,90,360, or 720) by appropriately sub-dividing the dataset
(using appropriate solution interval settings).

This dataset is unrealistic in some ways, but it is a solid basis upon
which to explore performance improvements and comparisons with other
packages.  More-realistic systematic errors (including other effects)
and lower snr will be attempted in the near future. 

The improvements are (see plot 1):

1. minimized slot counting (bookkeeping of solutions)
2. Improved convergence criterion
3. Predict from previous solution.
4. Remove log messaging and an extra chi2 calculation
5. more conservative convergence criterion (handles real data better)
    (this was a step backwards in performance)
6. patch memory leak
7. avoid unnecessary gain matrix inversion
8. initial calibration store optimization

A fair amount of the work (1,7,8) has involved recognizing different
operational contexts in the calibrater tool, e.g., solving vs. applying
calibration, such that processing methods they would otherwise have in
common are specialized appropriately to the different contexts.  A few
bugs have been fixed (4,6), and the convergence criterion and solution
prediction have been massaged (2,3,5). 

Plot 2 shows the comparsion of aips++ and aips for 1 and 8 channel
datasets.  At the low slot count end (left), aips++ is ~3X slower for 1
channel (3 sec vs. 1 sec), and ~2X slower for 8 channels (6 sec vs 2 sec).  
As plot 3 shows, the slope of the aips++ curve is dominated by the
calibration table write step.  This is due to row-wise (not column-wise)
I/O, and we believe we know how to fix this.

Plot 4 shows that with increasing channel number, aips++ becomes more
competitive with aips.  This is most likely due to the tiled disk I/O used
in aips++.  However, the slope also increases, although the solve process
itself (and the write step) is not a function of number of channels (the
channel data are averaged before the solve).

Work is continuing on performance improvements....

Plot 5 shows that introducing the trivial model assumption to the I/O step
reduces the execution time dramatically.  This is the first optimization
(other than the log messages excision) which substantially reduces the
y-intercept in the performance curves.  Previously, aips++ has been
reading the model from the MODEL_DATA column (on disk), which is
unnecessary when the model is trivial (a point source).  This issue is
complicated somewhat when a priori calibrations are considered.  Some
thought is currently going into how to do the appropriate logic to make
sure that a priori calibration, data normalization, and frequency and time
averaging are done in the optimal order while maintaining the proper
algebraic order of the calibration terms (some don't commute conveniently
with others, and with the normalization and averaging steps).  The trivial
model assumption simplifies this logic in most cases.

Plot 6 shows that (for 128 channel data), the solve is dominated by I/O
(data-only in this plot, not model), pre-solve time and frequency
averaging (which will reduce by a factor of ~2 when the trivial model
assumption is introduced to them), and by unnecessary in-memory data
copying.  This last item in the solve step is responsible for the
increasing slope(nchan) of the aips++ curves.   The fundamental
solve components themselves are very small. 


Summary

1. For larger numbers of solutions, the current stable calibrater is
dramatically faster than that of ~2 months ago.

2. Performance related enhancements include a significant measure
of operational context- and mode-dependent specializations, as well
as some genuine errors in the earlier code.  So far, the fundamental
generality of the calibrater's solver has not been compromised, so
these performance improvements are realized in other solve contexts
(e.g., B, D).  

3. Several outstanding performance issues are currently being worked,
including: optimizing the cal table write, a full implementation of
the trivial model assumption, and cleaning out unnecessary in-memory
data copies in the solve.   These should be done in the next few
weeks.

4. I will be discussing many of these issues with Eric, including how the
aips curves are so flat, and how aips does the normalization/averaging
prior to the solution.  

Plots

Fig 1 - sim_27ant_2h_10s_1ch_allcals
http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig1.pdf

Fig 2 - comp1and8
http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig2.pdf

Fig 3 - calsolvetimecomp
http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig3.pdf

Fig 4 - comp8and64and128
http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig4.pdf

Fig 5 - comp8and64and128_triv
http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig5.pdf

Fig 6 - solvecomp2
http://www.aoc.nrao.edu/~smyers/aips++/plots/gm.20030827.fig6.pdf