EVLA CASA Test: 18-22dec06
==========================

I. Introduction
  The second EVLA CASA test was held 18-22 December 2006.  This test was...

  * the first full test of CASA (as compared to aips++), the primary difference
    being the switch from glish to python
  * the first test of the revised user interface, with the introduction of tasks
    and inline help.
  * the first ``wide, not deep'' test, with the testers encouraged to try
    using CASA to reduce their own data, rather than working on a few pre-tested
    data sets or closely following pre-tested scripts.

The test itself was focused on the user interface (including documentation)
and basic calibration (including data weights).  Self-calibration was not
tested; imaging and (rudimentary) flagging were available, but were used
primarily to check the calibration.  Two additional tests, specifically for 
the EVLA, were planned for this week as well: one to check whether SDM data 
could be read into CASA and written out to UVFITS files usable by AIPS; and
another to test pointing self-calibration.  More details are given in the 
charge to the testers, available at
  http://projectoffice.aips2.nrao.edu/EVLA2006.12/EVLA2006.12.html .


A. The Process
  There were eight primary testers, all NRAO employees.  Of these, six are
based in Socorro (Walter Brisken, Mark Claussen, Gustaaf van Moorsel,
Frazer Owen, Michael Rupen, and David Whysong), and two are based in
Charlottesville (Ed Fomalont and Crystal Brogan), although EF was visiting
the AOC during this test.  The primary CASA pundits on call for the week,
also all at the AOC, were Kumar Golap, Joe McMullin, and George Moellenbrock. 

  The testers generally worked for several hours each day, sending bug
reports and enquiries throughout, and summarizing their experiences in daily 
written reports.  We met with the CASA group for 30-60 minutes each
morning to discuss progress and outstanding issues. There were also occasional
smaller sessions on specific topics (e.g., smoothing vs. interpolation) that
came up during testing.   Finally, the testers and the CASA group (as well
as Brian Glendenning) met together on Friday afternoon for a general
discussion of the results.

II. Results
  The detailed daily logs of the testers are available at the Web site above.
The same Web page also includes a link to JM's spreadsheet summarizing the bugs
reported and progress on their resolution.  Here we give a broader overview
of the results.

A. Overview
  Both testers and programmers found this test a very useful one.  While
there were stumbling blocks and annoyances, all of the testers left with the
feeling that CASA is at last on the road to becoming a usable data reduction
package for ordinary astronomers.  There is a great deal to do, but the user
interface in particular has been improved tremendously.  On the CASA side,
the wide array of data sets (and users!) thrown at the package proved very
effective in discovering defects, in providing useful suggestions, and in
beginning interesting dialogs between users and programmers.  The challenge
now is to ensure that this sort of vibrant interaction and progress
continues in the months ahead. 

B. Interface and Ease-of-Use
  One of the main aims of this test was to provide a first check of the 
CASA team's response to the suggestions that came out of the User Interface
Charette in June.  In addition to a number of smaller innovations, the
main items tested here were (1) the implementation of the first tasks,
and (2) the new in-line help.  The testers uniformly found the user
interface vastly improved, as is evident in the number of relatively
small suggestions that would have been far "below the noise" previously
("Oooh, this is really cool!  Could you maybe do this too???").  The task
interface and the in-line help were both widely praised; even the tools were
felt easier to use, thanks both to the available in-line help, and to the
on-going rationalization of the various methods and inputs.  More basically,
there was overall improvement in both stability and responsiveness, compared
to the glish (aips++) interface.  While much remains to be done these first 
steps are clearly very positive ones.

  One possibly surprising result in this area was that many testers actually 
did read the cookbook, suggesting that effort spent on that document would be
richly rewarded.  On the other hand, at least one, more experienced tester
was able to avoid the cookbook altogether, because he found the inline help
and inputs were sufficient.  This seems an excellent situation, with
neophytes able to learn usefully from the cookbook, and pundits able to
stick with in-line documentation rather than having to refer to printed or
Web-based documents.

C. Calibration
  The other main test area was basic interferometric calibration, including
(for the first time) the calibration of data weights.  Here we have a more
mixed review.  On the positive side, the basic calibration structure seemed
reasonable and worked fairly well for a wide variety of data sets.  Detailed
comparisons by EF also found an excellent correspondence between the
data and weights for one or two data sets calibrated independently in AIPS
and in CASA.  However, there were a few problems as well.  Flagging was not
part of this test, but testing calibration on unflagged data was problematic
at best, so many testers either battled with the existing CASA autoflagger
(with much cursing), or flagged in AIPS before reading the data into CASA.
More seriously, plotting calibration results (plotcal) was painful in
various ways, making it difficult to evaluate those results.  We also found
several problems with polarization calibration, uv-range restrictions, and
calibrator models, many of which were resolved during the testing week.

  The testers would like some method of tracking the calibration ---
what has been applied and how.  The documentation was seen as too technical 
and occasionally confusing (especially "accumulate").  We would also like
to carry error bars or signal-to-noise ratios along with the calibration
solutions, at least to enable the user to set an SNR cutoff for acceptable
solutions.  Finally, some of the CASA code assumes that the observations
are in dual polarization, and some bugs associated with single polarization
data should be fact.

  In sum, this area clearly needs some more work,  as well as more thorough
testing, but the basic approach seems reasonably attractive.

D. Missing capabilities
  A number of areas were not officially part of this test, but were felt
to be absolutely essential to "real world" use of the package, and thus
deserving of fairly immediate attention.  These included:

  * Support of remote sites: the one tester in Charlottesville (CB) had a
    very difficult time of it.  Partly this was just chance, as her
    particular (albeit fairly simple) data set manifested a series of
    unfortunate interactions with several important tasks.  But overall it was 
    clear that we need some better way to support rapid bug fixes for remote 
    sites.  Obviously this will become more and more important as the use of 
    CASA spreads beyond NRAO.
  * Basic flagging.
  * Plotting, both of the data themselves, and of the calibration results.
    There are innumerable annoyances, and occasional actual bugs, in both
    areas.
  * CLEAN boxes.  Now that data weights are calibrated, this is seen as the
    major impediment to high-quality imaging in CASA.
  * Tool-level in-line help and the URM.  Both are spotty at best at the
    moment.

E. EVLA-specific test: converting an SDM/MS for AIPS
  We had hoped to do a full round-trip test, reading a complicated SDM into
CASA and writing it out as UVFITS files readable by AIPS.  In the event the
only available "certified" ASDM data set was rather boring, and the only
test (done by EF) was to perform a very detailed check of a complicated PdB
MS with a variety of spectral properties.  CASA correctly wrote multiple 
UVFITS data sets, each of which were successfully read by AIPS (FITLD).
Both the data and the important tables (e.g., the antenna positions) seemed to 
agree perfectly between CASA and AIPS.  The main negative is that
reading/writing data seems to be a factor 10 slower in CASA than in AIPS
for uv-data with either multiple sources or multiple spectral windows
(or both).   In sum, the results were very heartening, but the speed issue 
must be addressed, and more extensive tests should be done as the SDM 
develops further.

F. EVLA-specific test: pointing self-calibration
  The pointing self-calibration code proved impossible to port over to CASA
in time for this test.

III. Summary
  Overall this was quite a useful and generally a successful test.  Much
work remains to be done, but the current release of CASA is an enormous
improvement over what we had even six months ago, a tribute to the impressive 
efforts put forth by the CASA group this year.  The highest priority in the
near term (six to 12 months) should be on general use issues (ease-of-use,  
basic capabilities, documentation, interface), to continue this excellent 
progress.  Currently CASA is by no means ready for general use, or for most 
scientific processing.  The challenge now is to meet a very aggressive 2007
schedule, with a first public release in September, while continuing to
address algorithmic and technical development (e.g., speed issues) in the
context of a new emphasis on user support.

================================================================================