Comparison of AIPS++ and AIPS Imaging/Deconvolution Speeds

                          Ed Fomalont

                        January 10, 2002


                           SUMMARY

    I have compared the timing of several imaging and deconvolution
tasks in AIPS++ and AIPS using a calibrated and edited data set with
627,000 points from 6-hours of continuum observing with the VLA.  On
the average, the cpu times for running imaging/deconvolution tasks
with AIPS++ are about a factor 5 longer than that for AIPS.  Even with
the relatively simple task of making a dirty beam and a dirty image,
AIPS clearly outperforms AIPS++; 12 s versus 70 s (see details below),
using a Dual/Xcon 1.7 GHz cpu.

    The AIPS and AIPS++ images agree extremely well.  The field
contains over 100 sources, mostly unresolved, with a peak flux density
of 40 mJy and an rms noise of 40 uJy - 1000:1 dynamic range.  The
quality of the AIPS++ image is marginally better than the AIPS image.

    The summary of the comparison times are as follows:


                     COMPARATIVE EXECUTION SPEEDS

TASK                         CPU TIME (sec)     Comments
                              AIPS++  AIPS

Reading in U-V data            470      4       AIPS++: 200 Mb data volume on disk
Writing U-V data                90      3       AIPS: 40 Mb data volume on disk
                                                      73 Mb uncompressed

Making dirty map and beam       70     12       2kx2k image

APCLN (cleaning images)         47     19       2kx2k, 4000 iterations

IMAGR (Wide field clean)       466     70       2kx2k, 4 major cycles
                                                 2000 components

IMAGR (9 facet clean)         2767    835       2kx2k, 4 major cycles
                                                 2000 components

IMAGR (25 facet clean)       10000   1445       2kx2k, 4 major cycles,
                                                 4000 components

Selfcalibration of data set    346    168


1.  The Data Set:

    The input data set consists of 6 hours of VLA data at 1.4 GHz,
taken in the CnB configuration, in a standard UV-FITS file of size 75
Mb.  The data base contains 627,000 u-v points with 2 IF's (1.465 and
1.385 GHz), each with four Stokes parameters.  The data set was
calibrated, edited and passed through one iteration of phase-self-cal
in AIPS in order to obtain a data set which was ready for deep imaging
and cleaning.  The data after these calibrations is stored in:

  efomalon@thuban.aoc.nrao.edu:/DATA/THUBAN_1/FITS/efomalon/AXAFL_BEST
  efomalon@bonobo.cv.nrao.edu:/DATA/BONOBO_1/aips++/AXAFL_BEST

The AIPS++ scripts and AIPS run files used in these comparisons are
available in this same directory.

2.  The Comparisons:

    The timing results presented here are the cpu times using bonobo
in Charlottesville (192.33.115.174) with a 1.7 GHz processor and 1 Gb
of memory.  The system.resources.memory was set to 800 Mb.  Other
computers with different resources have been used for similar tests
over the last month, and the ratio of AIPS++/AIPS execution times was
nearly identical across a change of resource power of about a factor
10.  None of the machines used less than 256 Mb of memory.  The
computers were virtually empty except for these executions, although
the cpu time does not vary much with moderate machine loads.

    One curious time interval, 35 s, seems to be present in many
AIPS++ reduction steps for this data base.  My uninformed guess is
that this period may be the time needed to read through this data set
in order to calculate, for example, each of the many residual images
as the deconvolution and the faceting progress.  For example, in
making each facet during the widefield imaging, AIPS++ took 35 s
whereas AIPS only took about 4 s.


3.  Detailed comparison of execution speeds:

     The following table compares the execution time (cpu time) in the
computer bonobo in Charlottesville (192.33.115.174) with a 1.7 GHz
processor and 1 Gb of memory.   The visibility data, scripts and run
files are in the directory indicated above.


A.  Reading/Writing from/to a UVFITS file

SCRIPT             SYSTEM            EXECUTION TIME         DATA BASE SIZE                 

fitstoms.g         AIPS++               470 s                  200 Mb                    
F2MS.*             AIPS                   4 s                   40 Mb  (73 Mb uncomp)        

mstofits.g         AIPS++                90 s
MS2F.*             AIPS                   3 s

COMMENTS: All UVFITs data bases were in the working directory of AIPS
or AIPS++.  The execution time for AIPS++ is slower than what I have
remembered in the past, and the difference in reading and writing time
for the fits format is surprisingly large.


B.  Generate a dirty image and beam.  2048x2048 image.

SCRIPT              SYSTEM           EXECUTION TIME and FUNCTION

dirtymapbeam.g      AIPS++               7 s      Set data
                                       (36 s)     Initialize three column
                                       (35 s)     Image weighting
                                       (38 s)     UVrange
                                       (18 s)     filter
                                        35 s      Make beam
                                        38 s      Make image
                                      -------
                                        70 s      TOTAL (no weighting changes)

DIMAGE.*            AIPS                12 s      Complex image (map and beam)
                                         1 s      Adding several weighting options
                                      -------
                                        13 s      TOTAL

COMMENTS: AIPS makes a map and beam (one complex image) in 12 s.  If
you specify uvranges, filters, tapers, the execution time increases by
1 s.  AIPS++ takes about 35 s to make a map and 35 s to make a beam
and 7 s to specify the data.  If you further weight or taper the data,
then up to 100 s can be added to the execution time.

     This task would be a good place to begin the investigation of the
timing differences.  Making dirty images from the residual data is a
basic part of nearly all of the deconvolution methods.  Hence,
improvements in efficiency of making the dirty images would transfer
into better efficiency of most of the imaging and deconvolution tasks.


C.  APCLN (Clark clean of dirty images)

APCLN.g              AIPS++              47 s      4000 iterations  8 major cycles

APCLN.*              AIPS                19 s      4000 iterations  8 major cycles


Comments: Using the AIPS++ script, I made an APCLN like execution
which cleans an image from the dirty map and beam without going to the
u-v data.  This type of cleaning is relatively accurate in the inner
quarter of the field and is much quicker for large data bases.  In
this task the AIPS++/AIPS comparison is better than usual, with a
ratio of the deconvolution times of 2.5.  This type of deconvolution
does not utilize the u-v data; hence may not be subject to possible
inefficiencies which might be associated with uv-data access in
AIPS++.


D.  Wide Field Clark Clean: 2048x2048 for 2000 clean iterations.
Clean most of field


SCRIPT              SYSTEM           EXECUTION TIME

imagr.g             AIPS++             350 s      Cleaning 2000 iter., 2 major cycles
                                       486 s      Cleaning 2000 iter., 5 major cycles

DCLARK.*            AIPS                35 s      Cleaning 2000 iter., 2 major cycles
                                        70 s      Cleaning 2000 iter., 5 major cycles

COMMENTS: This is the basic high-fidelity imaging task which is most
commonly used in AIPS, IMAGR.  Most of the image area can be cleaned
using this method since the clean components are subtracted directly
from the data.  One has to be careful to make an equivalent AIPS++ and
AIPS test, since each program has somewhat different controls.  I
cleaned down to the same residual level and tweaked parameters so that
the same number of major cycles were used.  The images from AIPS++ and
AIPS were in good agreement, with the AIPS++ image marginally cleaner.

E.  Wide Field Clark Clean with 3x3 facets cleaned to same depth ~2000 iterations

SCRIPT              SYSTEM           EXECUTION TIME

clarkclean.g        AIPS++               7 s      Set data
                                      2760 s      Cleaning facets, 3 major cycles
                                      2767 s      TOTAL

FACETS9.*           AIPS               460 s      Cleaning facets, 3 major cycles
                                        55 s      Glue together
                                       835 s      TOTAL

Comments:  The quality of both images are nearly the same.  The execution time
depends on the number of major cycles which is difficult to control in both
systems.  Most of the execution time is taken in making each of the residual
images for each facet after each major cycle.  For AIPS++ each facet takes
about 35 s to make; for AIPS each facet takes about 4 s to make.  This is
where the difference in timing is coming from.


F.  Wide Field Clark Clean with 5x5 facets cleaned to same depth

SCRIPT              SYSTEM           EXECUTION TIME

clarkclean.g        AIPS++               7 s      Set data
                                      4860 s      Cleaning facets,  2 major cycles
                                     19348 s      Cleaning facets, 10 major cycles
                                     10000 s      TOTAL (estimated for 4 major cycles)

FACETS9.*           AIPS              1390 s      Cleaning facets, 4 major cycles
                                        55 s      Glue together
                                      1445 s      TOTAL

Comments: The same comments apply here for the 25 facet clean as for
the 9 facet clean.  By the way, the recommended number of facets for
full field cleaning is 81, or 9x9.  The AIPS++/AIPS execution ratio is
worse for the 5x5 facet cleaning as compared with the 3x3 facet
cleaning.  This may be because of the inefficiency for AIPS++ to
calculate a facet, compared with other parts of the algorithm.

G.  Selfcalibration of Field

                    AIPS++            36 s  FFT to get model
                                     272 s  for solutions
                                      38 s  to correct
                                     346 s  TOTAL

                    AIPS             167 s  for solutions
                                       1 s  to correct
                                     168 s  TOTAL

Comments:  The difference in speeds for the selfcalibration steps are
not too different.  The major time is spent for obtaining the solutions,
with AIPS++ about 50% slower than AIPS.  The difference in time to
correct the data is not surprising.  AIPS makes a new CL table, whereas
AIPS++ writes the calibrated data in the measurement set.