Date: Tue, 7 Jan 2003 10:55:35 -0700
From: Frazer Owen <fowen@aoc.nrao.edu>
To: dshepher@pilabo.aoc.NRAO.EDU, jmcmulli@pilabo.aoc.NRAO.EDU
Subject: Old widefield requirement document

	AIPS++ requirements for widefield/deep imaging/analysis


	This is an attempt to describe the steps necessary to
make a wide-field, high resolution (A or B array), deep VLA
survey at 20cm. The motivation is to encourage making this
process possible in AIPS++. The process is complex and also
requires steps outside of what one would think of as radio
synthesis imaging. However, to arrive a a useful scientific 
result one needs to consider the entire process.

I. Filling the data: database efficiency

	One major problem what making a deep survey is that it
requires a lot of data. Presently, the AIPS++ database is very
nice for small problems, carrying along all the information needed
in a nice accessible way. However, the size of the databases for
deep surveys (20-100 hours, 5s integrations (probably should be shorter),
351 baselines, 2pol X 2IFs X 7channels) makes a database too large 
practically to be swallowed by the AIPS++. Also while disk sizes will
eventually absorb this database size, the new correlator will increase
these numbers by about 100X, so AIPS++ needs a mode where it deals with
the databases in as efficient a way as possible. It seems to me this
means a 16bit compressed format and possibly fewer bits of information
being carried along. AIPS++ at least needs to match AIPS efficiency
here.

II. Flexible uv Editing

	Throughout the process, starting just after filling at
any time one needs to be able to investigate the uv data and apply
one of many flagging processes. In AIPS I use TVFLG, UVFLG, CLIP,
and FLGIT at various stages depending on what I encounter. Flexible
flagging and display is a must. One must also have some way to reverse
the process, either by keeping multiple copies of the database or by
easy reversibility of the flagging. The latter is harder than the former
but is more space efficient.


II. External Calibration

	1. Bandpass Calibration
		a. phase self-calibrate bandpass calibrator before
	any calibration is applied. 
		b. calculate and display (check) bandpass corrections.
		c. apply corrections to original database, including
	unselfcalibrated bandpass calibrator since it may be used to
	external phase calibration.

	2. Bootstrap amplitude calibration from flux calibrator to
           phase calibrator

	3. Calculate and apply amplitude and phase calibration

	4. Calibrate weights using external system temperatures. Currently
	I fit a polynomial to the variation of Tsys with elevation and
	apply a global Tsys vs elevation correction to all antennas. This
	is because I don't believe the noise tube calibrations. There are
	several other ways this might be done. The relative weights are often 
	in error by factors of 5 or more and thus this is important and 
	should be viewed as part of the necessary calibration.

III. Setting Up the Grid of Imaging Positions

	For wide-field, 3D imaging, one needs an inner dense grid of
of facets which cover the primary beam out to almost the first null
calculated so that 3D effects are small enough to be ignored. In addition
one needs a number of outlying fields to cover the bright confusing
sources which lie in the first sidelobe or beyond. The first part of
the problem is straightforward and I believe AIPS++ can do that. The
outlying sources are another problem. One can find them by making a
tapered image of a very large field and/or using external catalogs
like NVSS. To do the latter one would need a program which has fairly
detailed model of the primary beam beyond the first null. The first
way is easier and that is what I do most of the time. AIPS does have
a version of the second method which some people use.

	Once one has an image one needs an easy way to pick out
to pick out and record the confusing sources. I usually display
a box on the TV which covers the area sampled by the dense inner
grid, so I can see what parts of the image to ignore. Then I use
TVMAX in AIPS to point at the outer confusing sources and measure
their positions. AIPS uses a simple ASCII file as the input source
for IMAGR to tell it where to put the outlying fields as well as the
dense grid. Thus one can simply use the mouse and an editor to move
the TVMAX fitted positions into the list of coordinates for AIPS.

IV. Picking the Weighting for IMAGR

	Besides the external weights one also needs to pick the
weighting schemes to use on the gridded uv data to get the best
compromise between beam size and noise characteristics. This requires
trying several different sets of parameters before one starts. The
best solution depends on the detailed uv sampling, so it is not
possible in general to decide this in advance. 

	Right now for me this is usually a combination of ROBUST,
SUPERUNIFORM and a TAPER. There could be a bigger parameter space than 
this. One needs to do this to be sure one is getting the best
combination before going through the very long reduction process
to come.

	To do this in AIPS++ right now as I understand it one
must undo the imaging weights in the database each time and go re-apply
all the different forms of weighting one wants each time. One must
also remember what one has done. AIPS, on the other always recalculates
all the imaging weights, so one must just change a single parameter
in IMAGR and rerun the program to try another combination. This is
computationally inefficient but that is a rather minor matter in total
processing time for a deep survey. 

	There needs to be an user efficient way to do this, probably
with a GLISH script and perhaps with a grid of trial beams and statistics
produced. I also would prefer always to apply the weights on the
fly to minimize bookkeeping but that is a close call.

V. Optimal Averaging of the Input uv Dataset

	Before imaging to minimize imaging time, one should average
the data in time and frequency to produce the smallest input dataset.
Perhaps one should do this before step IV. In AIPS one can do this
with the program UBAVG which runs through the data on each baseline
and averages versus time in such a way as to produce less than an
N percent error inside some radius from the phase/pointing center.
This can make a significant difference in imaging time since the
uv-based clean process is dominated by the number of uv points it
must subtract a model from. 

	In principle there should be a step in the imaging process
which does this optimization of the uv dataset for both time and
frequency. With the current correlator at 20cm frequency sampling is not
usually adequate, so only time averaging is important but at longer
wavelengths now and in the future at 20cm and shorter wavelengths 
frequency averaging will be a issue.

	However, one does need to be able to go back to the full
time resolution in the selfcal step. Also when one is combining
uv data from multiple days one needs the full time resolution.


VI. Initial Imaging

	Once one has the full grid set up, one can make the initial
set of images. One needs to clean this image to a moderate level
so one can see all the sources. AIPS++ should be able to do this now.
For deep imaging one normally will have multiple days with the same
or similar uv coverage. For this initial image, one uses just one
day's data to minimize the processing time.


VII. Initial Boxing

	Without BOXES to limit the areas CLEAN can consider, CLEAN
will scatter power outside the regions with real sources to optimize
its fit to the data. This will bias the noise low, bias the source
flux densities low, and take longer to run than if one restricts the
algorithm. It will also produce a non-optimum model for the selfcal
process. There are other ways to solve this problem besides BOXES
such as rejecting isolated clean components by some method before
restoring the image but BOXES seems to me the most straight-forward,
even if it is expensive in real time for the user.

	In theory, one could write program to find the boxes. However,
in practice such a task has trouble distinguishing between residual
sidelobes from strong sources and real sources which the human can do
fairly easily. For big surveys, some such automatic program is needed
but I suspect we may always need a user check and adjustment of the
program's solution. Thus some sort of interactive environment is needed.

	In AIPS right now the process is entirely interactive. One
displays the image, or part of it, on the TV and uses the graphics
overlays to mark where the BOXES should go. The interactive program,
FILEBOX, writes the marked positions in the same ASCII file as it
stores the field centers and sizes. This makes editing this file
very flexible. This process is very time consuming, however.

	In the ideal world there would be a program which sets up
the first guess at the BOXES (or more general REGIONS). Kumar already
has produced a first attempt at this. Then the user would display and correct
interactively the automatically created REGIONS.

	The most obvious thing missing from AIPS++ is the graphical
overlay for the TV. This will come up again and again in the image
analysis steps.

VII. Second Imaging

	After the BOXES, one reruns IMAGR, staring from scratch with
the clean components and clean down to the current noise level.

VIII. Selfcal loop

	One then selfcals the first days uv data using all
the clean components and re-images the field. Usually, one does
this the first time with a phase-only selfcal, re-images, and then does
it again with a phase and amplitude selfcal and re-images again. After
each re-imaging one checks the clean boxes, usually adding quite a few
after the first selfcal. The amplitude and phase selfcal is usually
done with a longer solution interval to increase the S/N since one is
solving for more variables the second time. Besides fixing poor calibration
the amplitude and phase step also forces the two IF's to be on approximately
the same flux scale, which minimizes error for bright sources. This is
a practical matter right now. In an era with the new VLA correlator this
probably would be done differently to keep track of the differences in 
the primary beam with frequency and probably to solve for the spectral
index as well as the average total intensity across the band. Usually two 
selfcal passes are adequate, so this is not really in the DIFFMAP limit.

IX. Further uv Clipping

	Once one has a good model one can subtract out the sources
and do a uv editing step on the uv residuals. Normally, I subtract out
the model with UVSUB, use UVPLT to plot the data, and CLIP to remove
remaining high points. Sometimes a pass through TVFLG is used here.
Then one adds the MODEL make into the remaining data with UVSUB again.
Exactly how this step fits with VIII can vary. 

X. Calibration of the Rest of the Data.

	Once one has a good image for the first night, one can use this 
model to calibrate the rest of the nights. Often just one amplitude
and phase "selfcal" will be good enough. Also one needs to go through
step IX on the rest of the data at this point.

XI. Compressing the Full uv Dataset

	Once again one wants to average the data in time and frequency
before imaging. However, in addition, one wants to average the multiple
days together to minimize the size of this full dataset. 

	In AIPS right now this involves going through a series of steps
not intended for this purpose. One converts the times into HA's, edits
the headers to make the individual datasets look like they all were
observed on the same day, DBCONs all the datasets together, sorts them
in baseline-time order, averages them optimally versus time, and
sorts then back in time-baseline order for IMAGR. I probably am leaving
a few steps out. Clearly a TASK to do all this would be a better approach.
However, this effort and bookkeeping is well worth it because one can
reduce a big database by a factor of 10 or more and thus speed up the
imaging by almost the same factor.

XII. Imaging the Full Dataset

	At this point one normally makes an image with the full dataset,
adds and resets the boxes again, does a selfcal using the optimally averaged 
dataset, and then makes an "almost" final image.

XIII. RR/LL Pointing Errors.

	At this point the residual pointing errors usually limit
the result. The beam squint produces pointing differences between the
R and L feeds which are understood to first order. For sources near the
field center this is not too big a problem; however, for bright sources
further for the pointing center the effect can be large and can be the
big problem, scattering sidelobes over large regions of the map and
increasing significantly the average noise level. This can be helped
by a good choice of weighting at step IV. The pointing error also varies
with paralactic angle as the position of a bright source relative to
the pointing error vector changes.

	The only way in current AIPS to deal with this problem is
to divide the data into small ranges of paralactic angle, by
polarization and by IF. Then one images and selfcals each subset
separately, forcing the same clean beam for each one. Then one measures the 
noise on each resulting image and stacks them together weighting them optimally.
Usually if I have 6 hours for each uv-track (only six hours to minimize
the system temperature increases at low elevation), I divide the data
into two time intervals, two IFs and two polarizations, producing 8
images in the end I must stack up. 

	Eventually, in AIPS++ one hopes this problem will be reduced
by smarter selfcal/imaging programs which can 1) take into account
the RR/LL pointing offset from the start, 2) solve for pointing errors
with time and include them in the processing, 3) solve for local pointing
errors for bright confusing sources and treat them separately in the
processing. However, this processing may complicate the data compression
issues and require a rethinking of the sequence of events in the processing.

XIV. Making the Source Catalogs

	After the images are made there is still a lot to do to produce
usable scientific results. The first, most important, step is to make
a source catalog. To do this one first runs an automatic source 
finding/fitting task. In AIPS this is SAD. After adjusting a number of
parameters and running the program on the first facet of the image, one
produces an ASCII catalog. However, the same problems exist with this
task as with the automatic boxing procedure. When one is in a region
affected by the residual sidelobes of a bright source, the program has
trouble tell the difference between sidelobe ripple and real sources.
Thus one is required to investigate each source to be sure it is real.

	I do this by using the ASCII list, the mouse and an almost
trivial AIPS script called COVTLOD. COTVLOD allows me to specify
a center position and an image size and then load a region of that
size on the TV centered on the position specified. While it would be
fairly easy for each user to write such a script, it is very efficient
for one to be available in the package. I similar script, COSTAR, will
also plot a symbol on the TV using the graphics overlay. I often plot
a circle centered on the position to make recognizing the source easy.
At this point I usually re-fit the flux and position of the source
with JMFIT and copy the answer with the mouse to my summary table. This
last step is probably unnecessary and is partly driven by a less than
ideal format for the SAD output.

	For large sources, I usually measure a total flux density
with TVSTAT, make a contour/greyscale image with KNTR, and measure a
total size from two extreme positions on the image. If there is
a central component or other bright feature, I measure its position
using JMFIT, TVSTAT, or in extreme cases with the IMPOS, which just
reads the positions which the cursor is pointed to.

	AIPS++ clearly needs the source finding/fitting/cataloging task.
It also needs the graphics overlay and should have a script to load a
small region around a given position as well as being able to mark
particular positions on the TV.

XV. Comparing with other catalogs

	At this point one normally wants to compare the radio image
with other images: radio, optical, IR, X-ray etc. In general other images
will not have coordinates will not be well registered with the radio
system. One needs a way to improve this. This is done in AIPS with
a program called, XTRAN, which can take a set of known positions on
the optical (or other type) image and calculate a transformation to
a new coordinate system. One can then use the SAD process to make a
catalog for this image. One can also use HGEOM to transform the image
again to register with the radio image.

	However, normally it is easier to mark all the radio source
positions on the optical (or other) image and look for matches. This
is done in AIPS by making an ST file. The ST file is, in turn, made
by running a task, STARS, which converts an ASCII file into an internal
AIPS extension file. One can then use TVSTAR to mark the positions of
a large number of objects on another image. For optical identifications,
one usually uses the marks the smaller number of radio sources on the
optical image. One can also plot the ST files on greyscale or contour
output images. Another use is checking the final catalog one makes
in step XIV. 

	COTVLOD and COSTAR are extremely useful at this point. One
often has an interesting object in a given field that someone has
found something interesting about at another wavelength. Using these
two scripts one can investigate the radio image at this position
quickly and precisely. 

	It also is very useful to be able to plot a contour map
from one band on top of a greyscale from another. For catalogs of
multiple types of objects, it is useful to make several ST files
and display each one in a different color.

	While most of the latter capabilities are not imaging, they
are necessary for a complete processing package which can produce
actual scientific results without going into another package. AIPS++
must be able to deal with data from other bands to be successful.


	Summary:

	AIPS++ seems to need several improvements to make it useful
for deep, widefield imaging and several more for to take over all
the processing in this area. Many of these should be easy. However,
the uvdatabase, and improvements to the viewer seem like the most
fundamental. A few others like dealing with the pointing/gain errors
are clearly difficult but may be conceptually straight-forward. Much
of the rest may be doable with GLISH scripts but may take a lot of time
to tune adequately to make useful. It also is possible I have left
some things out and may need to revise this memo as I realize what I
have forgotten. But this is the process I believe is necessary as of
right now. 


---Frazer