NRAO AIPS++ Users Group Meeting - DRAFT MINUTES

Date: 2004-4-14 (Wednesday)
Time: 1300 MST
Video Hub: SOC-conf (CV should call in to Socorro)
Rooms: SOC317/CV311

1. NAUG News

   o SS5 was deployed late, but we are now in a freeze for the
     mid-cycle release SS5.5 which is intended to showcase the
     new performance improvements for the Visiting Committee.

   o Joe received a number of comments on data selection (most are
     also archived under the NAUG Notes:

     http://www.aoc.nrao.edu/~smyers/aips++/notes/

     *Action - Joe will collate the comments and responses will be
      written.

   o The items that need NAUG review/testing can be found at:

     http://projectoffice.aips2.nrao.edu/testing.html 

     (also accessed from the Module Testing link on front page).

     *Upcoming testing items will include mosaicing (May) and 
      autoflagging (?).

   o Testing reports

     *No significant testing

2. AIPS++/ISD Status Report

   o Summer School - we will be targeting mm reduction during the tutorials,
     using a PdB dataset.  Joe will discuss the timeline for AIPS++ and
     NAUG preparations for the school.

     Joe's NAUG note (07Apr04):

   The preliminary number of students who will be doing the AIPS++ data 
   reduction tutorial is roughly 25-30. Given student/tutor ratio of 3:1 as 
   in the past, we will need at least nine tutors. The proposed list is:
   
   1. Sanjay Bhatnagar		
   2. Walter Brisken
   3. Claire Chandler			confirmed
   4. Ed Fomalont
   5. John Hibbard			confirmed
   6. Kumar Golap			confirmed
   7. Joe McMullin			confirmed
   8. George Moellenbrock		
   9. Steve Myers			confirmed
   10. Debra Shepherd
   
   The proposed schedule for preparing for this is as follows:
   
   April 28	NAUG meeting - demonstration of data reduction to all of
   		the NAUG.
   		Stable for summer school available; Cookboook available.
   
   May x-x+7	NAUG spends one week doing the actual data reduction using
   		the cookbook; preparing for the tutorial. Any issues are
   		identified and resolved and/or documented. We are working 
   		the week that this will occur. Please send comments on
   		availability.

     *Action - Debra is seeking permission for the carbon star data from
      Robert Lucas.

   o George has made substantial progress on improving the calibrater
     performance (a factor >30 in one part of the code!).  Way to
     go George!

     *George summarized the status (see below, "Main Event")

   o Dongshan has been working on incorporating the autoflagging (from
     autoflag) into our code.  Autoflag was written by folks in the
     Netherlands, and after the breakup of the former constortium we need
     to have this code in a form we can maintain (and understand!).
     The NAUG will be asked to comment on usability issues in the next 
     month, and we will schedule this topic for an upcoming main event.

   o There will be a session on software at the Visiting Committee meeting
     on Apr 26.  Besides Joe and I talking on AIPS++, there will be other
     presentations by Jim, Doug, Bill, Dale, and perhaps others.  Looks like
     it will be 3 slides per presenter again like last time.

     *The proposed speakers will be:

   "DM"
   (1) Intro to ISD organization--Glendenning, 10 minutes
   (2) AIPS++ progress, robustness, etc.  McMullin/Myers, 40 minutes
   (3) e2e progress, deliverables, near-term plans.  Frail, 25 minutes
   (4) e2e gang of 7, plus gang of 4 for long-term planning.  Cotton, 15
       minutes
   
   ALMA
   (1) ALMA Software (Glendenning)
    
   Plus Jim, Gustaaf, Nicole for EVLA, VLBA, and GBT software in those
   sessions.

3. ALMA

   o We are in the process of adding G192 (VLA dataset) to the benchmark
     suite.

   o There is a draft of the ALMA Proto-pipeline retrospective.  More info
     when it is out.

   o ALMA TST2 is on the horizion.  It has been pushed back to August in order
     to afford time for development, which has been hard to schedule due to
     the pressures of testing.  

     For TST2 info, see:

     http://aips2.nrao.edu/projectoffice/almatst2/ALMA_TST2.html

     One of the focuses of TST2 is mosaicing.  We have some possible test
     datasets in hand, and will begin evaluating these.

     *ACTION: Joe and Steve will take first crack at the mosaicing data.
      (Carried over from previous meetings)

4. EVLA

   o No known EVLA issues at this time.

     *Walter noted that he has given Tim Cornwell a 15ch wide-field dataset
      to try w-projection on, and the results were excellent.  Noise was said
      to be 10% below AIPS generated images.

 5. Main Event - Presentations and Discussion Items

   o George summarized the status of his calibrater performance 
     improvements:

   Calibrater Performance Improvements as of 2004 Apr 14
   ====================================================
   George Moellenbrock
   
   Intro
   -----
   
   At last NAUG meeting, I discussed four areas in which calibrater
   performance improvements were likely: CalTable I/O, trivial model
   case, accumulated calibration, and core solver.  The first and third
   of these have been addressed, and improvements will be checked in to
   the system this week.  
   
   CalTable I/O  
   ------------
   
   The current CalTable I/O methods suffer from the costs of a row-wise
   treatment.  For each antenna and timestamp, the solution and all of
   its descriptive information is packaged into a record structure which
   is passed to the table system for write to the disk.  Considering the
   GAIN column alone (the most voluminous column), I have demonstrated
   with a test program that packaging the time-dependent solutions into a
   appropriately sized array (matching the shape it will have on disk)
   and sending this array en masse to the table system for write is
   nearly 70X (!) faster.  Implementing this for all of the relevant
   columns should make this fraction of the calibration solve execution
   all but disappear.  A similar improvement should be possible for the
   CalTable read.  (For very large solution sets, the column-wise I/O
   will occur for a minimum of suitably- sized ranges of table rows.)
   
   Accumulated Calibration
   -----------------------
   
   Initially, this item concerned accumulating the antenna-based factors
   for all calibration types (e.g., G, D, P) *before* forming the
   nAnt*(nAnt+1)/2 baseline factors (the '+' becomes '-' if ACs are
   ignored).  However, in the process of investigating this issue, it
   became apparent that even for application of a single calibration
   type, the order and manner of calculation of the baseline correction
   factors was not optimal.  The calibration solutions we solve for and
   store are the antenna-based factors (2x2 Jones matrices) which, after
   forming antenna-pair-wise outer products (yielding 4x4 Meuller
   matrices), corrupt a perfect data model via multiplication.
   Correction by this calibration therefore requires taking the inverse
   of these solutions (matrix "division").  In the current calibrater,
   the outer product of nAnt*nAnt pairs of 2x2 matrices (all antenna
   pairs in both directions) is formed, and then the inverses of all of
   these 4x4 matrices was taken.  It would be much quicker to take the
   inverse of the nAnt 2X2 matrices *before* forming (only) the
   nAnt*(nAnt+1)/2 required 4x4 matrics.  Additionally, the large number
   of 4x4 matrix inversions are performed with no consideration of
   whether the matrices were general, diagonal, or scalar.  Recognizing
   when the (4x4) matrix is diagonal (requiring the inverse of 4 complex
   numbers) or scalar (requiring inverse of 1 complex number) can save a
   factor of 16 (n^2) or 64 (n^3), respectively (n is the size of the
   matrix).  The gains for 2x2 matrices is somewhat more modest (4 or 8),
   but we will also gain by doing a factor ~nAnt fewer of them.
   
   The performance improvement available by choosing the optimal strategy
   (the code for clever matrix inversion already exists) are substantial,
   as the following table indicates. The table lists the performance
   figures for application of various numbers of different types of
   solutions to a 1030-timestamp dataset (VLA continuum polarimetry).
   The P (parallactic angle) solution is applied twice, first from only
   one p.a. (unrealistically) to illustrate the magnitude of other costs
   (mainly data i/o, and actuall apply), second using per-timestamp
   p.a. values.  The G and D solution applications are dominated by data
   i/o and the actual apply since there are so few solutions in these
   cases.  The T solution application includes 20 seconds of CalTable I/O
   which should largely vanish when the CalTable I/O improvements are
   included.  At that point the appliction of 1030 T solutions will be
   comparable to application the G and D solutions (many fewer solutions,
   T is scalar), at ~6.8 sec.  The P solution application is
   comparatively more expensive because the parallactic angle must be
   calculated on-the-fly.  Avoiding repeated disk I/O of the antenna
   positions used in this calculation should improve this.  Essentially,
   the baseline factor formation step during apply will be reduced to a
   near-negligible fraction of the overall calibration apply cost.  Note
   that solves for which calibration is pre-applied (on-the-fly) will
   also benifit.  Finally, applying several calibration types in sequence
   appears to be a relatively small incremental cost, as indicated by the
   last 3 rows of the table which are clearly dominated by the cost of
   applying P (the data I/O costs are the same for *all* rows in this
   table at about ~6 seconds).


   Improvements in Cal Apply Performance
   -------------------------------------

     Timings include: Data i/o, Soln i/o, 
                      Baseline factor formation, and apply


     Type  mtype   nSol   nData      Old(s)    New(s)
   ---------------------------------------------------------------------
      P     diag      1    1030       7.9      8.0  (incl on-the-fly calc)
      P     diag   1030    1030      23.9     12.6  (incl on-the-fly calc)
      G     diag     12    1030       6.8      6.3
      D      gen      1    1030       7.1      7.2
      T    scalar  1030    1030      36.8     26.8  (incl 20sec soln i/o)
     P,G                   1030      24.9     14.2
    P,G,D                  1030      25.0     14.0
   P,G,D,T                 1030      56.6     34.5  (incl 20sec T soln i/o)


   The data from which the above table was derived is the
   aips++/AIPS/miriad benchmark dataset, the simulated gravitational lens
   dataset.  Executing this benchmark before and after the cal apply
   improvements yields the following results (cf official aips++
   benchmarks at <http://aips2.nrao.edu/projectoffice/> , click on "ALMA
   Benchmark Page", then "Test Case #1"):
   
   
   Step             Old(s)   New(s) (as of 2004Apr16)
   ==================================================================
   Fill             19.4     19.1
   Setjy             0.7      0.7
   Phase/Amp Cal    33.1      6.3*  (solve + PG apply to calibrators)
   D Cal + apply    37.3     25.6   (solve=10.8 + full PG apply=14.8)
   Image1           72.3     72.1 
   Selfcal + apply 131.1     44**   (solve=30 + full PGDT apply=14)
   Image2           69.1     66.5
   ------------------------------
   Total           363      234.3 = 1.5X improvement
   
   * The original benchmark script (used for Old) includes a spurious
   step at this point: application of PG to the target source.  AIPS is
   not doing this.  
   ** 44 seconds does not include the current solution I/O cost of
   approximately 50 seconds (write=30, read=20) for the 1030-solution T
   table.  The solution I/O cost for the other types (G, D) is already
   negligible because these solutions sets are so small.
   
   Summary
   -------
   
   So, with these two improvements, the aips++ performance for modest
   continuum polarimetry datasets should reduce to about 1.7 X AIPS and
   3.5X miriad (from 2.6X and 5.2X).  In fact, the gains should be
   somewhat better than this since the fraction of time spent on the
   imaging steps is somewhat larger on my laptop than on the benchmark
   machine (due to differences in the h/w details).  Additionally, more
   improvements are possible, including optimization of the P
   calculation, implementation of the trivial model case (which will
   reduce the data i/o costs during solves by a factor of 2, e.g., the
   selfcal solve should decrease from 30 to ~24 seconds), and a number of
   low-level improvements in the core solver.

   -----

   o Possible future topics:

     - Auto-flagging (Dongshan)

     - Wide-field imaging, w-projection, mosaicing (Tim & Sanjay)

     - The ALMA proto-pipeline (Lindsey)

     - Framework technologies and plans (Doug?)

     - Data reduction demos?

6. AIPS++ Developments - see latest targets and info at Project Office:
     http://projectoffice.aips2.nrao.edu/

   o SS6 Targets -- see Project Office page for development status and
     testing info.

7. Upcoming meetings and deadlines:
   o 2004 Apr 16         AIPS++ SS5.5 (for VC profiling)
   o 2004 Apr 26-28      AUI Visiting Committee (SOC)
   o 2004 May 24-25      Users Committee (CV)
   o 2004 Jun 15-22      Synthesis Imaging Summer School (SOC)
   o 2004 Aug 1          ALMA CDR2 = Judgement Day! (docs Jun, rev in Jul)
   o 2004 Aug 1          ALMA TST2 begins??

   The agendas for past NAUG meetings are archived at:
   http://www.aoc.nrao.edu/~smyers/aips++/agenda/

   The minutes for past NAUG meetings are archived at:
   http://www.aoc.nrao.edu/~smyers/aips++/minutes/