Date: Tue, 7 Jan 2003 10:55:35 -0700 From: Frazer Owen To: dshepher@pilabo.aoc.NRAO.EDU, jmcmulli@pilabo.aoc.NRAO.EDU Subject: Old widefield requirement document AIPS++ requirements for widefield/deep imaging/analysis This is an attempt to describe the steps necessary to make a wide-field, high resolution (A or B array), deep VLA survey at 20cm. The motivation is to encourage making this process possible in AIPS++. The process is complex and also requires steps outside of what one would think of as radio synthesis imaging. However, to arrive a a useful scientific result one needs to consider the entire process. I. Filling the data: database efficiency One major problem what making a deep survey is that it requires a lot of data. Presently, the AIPS++ database is very nice for small problems, carrying along all the information needed in a nice accessible way. However, the size of the databases for deep surveys (20-100 hours, 5s integrations (probably should be shorter), 351 baselines, 2pol X 2IFs X 7channels) makes a database too large practically to be swallowed by the AIPS++. Also while disk sizes will eventually absorb this database size, the new correlator will increase these numbers by about 100X, so AIPS++ needs a mode where it deals with the databases in as efficient a way as possible. It seems to me this means a 16bit compressed format and possibly fewer bits of information being carried along. AIPS++ at least needs to match AIPS efficiency here. II. Flexible uv Editing Throughout the process, starting just after filling at any time one needs to be able to investigate the uv data and apply one of many flagging processes. In AIPS I use TVFLG, UVFLG, CLIP, and FLGIT at various stages depending on what I encounter. Flexible flagging and display is a must. One must also have some way to reverse the process, either by keeping multiple copies of the database or by easy reversibility of the flagging. The latter is harder than the former but is more space efficient. II. External Calibration 1. Bandpass Calibration a. phase self-calibrate bandpass calibrator before any calibration is applied. b. calculate and display (check) bandpass corrections. c. apply corrections to original database, including unselfcalibrated bandpass calibrator since it may be used to external phase calibration. 2. Bootstrap amplitude calibration from flux calibrator to phase calibrator 3. Calculate and apply amplitude and phase calibration 4. Calibrate weights using external system temperatures. Currently I fit a polynomial to the variation of Tsys with elevation and apply a global Tsys vs elevation correction to all antennas. This is because I don't believe the noise tube calibrations. There are several other ways this might be done. The relative weights are often in error by factors of 5 or more and thus this is important and should be viewed as part of the necessary calibration. III. Setting Up the Grid of Imaging Positions For wide-field, 3D imaging, one needs an inner dense grid of of facets which cover the primary beam out to almost the first null calculated so that 3D effects are small enough to be ignored. In addition one needs a number of outlying fields to cover the bright confusing sources which lie in the first sidelobe or beyond. The first part of the problem is straightforward and I believe AIPS++ can do that. The outlying sources are another problem. One can find them by making a tapered image of a very large field and/or using external catalogs like NVSS. To do the latter one would need a program which has fairly detailed model of the primary beam beyond the first null. The first way is easier and that is what I do most of the time. AIPS does have a version of the second method which some people use. Once one has an image one needs an easy way to pick out to pick out and record the confusing sources. I usually display a box on the TV which covers the area sampled by the dense inner grid, so I can see what parts of the image to ignore. Then I use TVMAX in AIPS to point at the outer confusing sources and measure their positions. AIPS uses a simple ASCII file as the input source for IMAGR to tell it where to put the outlying fields as well as the dense grid. Thus one can simply use the mouse and an editor to move the TVMAX fitted positions into the list of coordinates for AIPS. IV. Picking the Weighting for IMAGR Besides the external weights one also needs to pick the weighting schemes to use on the gridded uv data to get the best compromise between beam size and noise characteristics. This requires trying several different sets of parameters before one starts. The best solution depends on the detailed uv sampling, so it is not possible in general to decide this in advance. Right now for me this is usually a combination of ROBUST, SUPERUNIFORM and a TAPER. There could be a bigger parameter space than this. One needs to do this to be sure one is getting the best combination before going through the very long reduction process to come. To do this in AIPS++ right now as I understand it one must undo the imaging weights in the database each time and go re-apply all the different forms of weighting one wants each time. One must also remember what one has done. AIPS, on the other always recalculates all the imaging weights, so one must just change a single parameter in IMAGR and rerun the program to try another combination. This is computationally inefficient but that is a rather minor matter in total processing time for a deep survey. There needs to be an user efficient way to do this, probably with a GLISH script and perhaps with a grid of trial beams and statistics produced. I also would prefer always to apply the weights on the fly to minimize bookkeeping but that is a close call. V. Optimal Averaging of the Input uv Dataset Before imaging to minimize imaging time, one should average the data in time and frequency to produce the smallest input dataset. Perhaps one should do this before step IV. In AIPS one can do this with the program UBAVG which runs through the data on each baseline and averages versus time in such a way as to produce less than an N percent error inside some radius from the phase/pointing center. This can make a significant difference in imaging time since the uv-based clean process is dominated by the number of uv points it must subtract a model from. In principle there should be a step in the imaging process which does this optimization of the uv dataset for both time and frequency. With the current correlator at 20cm frequency sampling is not usually adequate, so only time averaging is important but at longer wavelengths now and in the future at 20cm and shorter wavelengths frequency averaging will be a issue. However, one does need to be able to go back to the full time resolution in the selfcal step. Also when one is combining uv data from multiple days one needs the full time resolution. VI. Initial Imaging Once one has the full grid set up, one can make the initial set of images. One needs to clean this image to a moderate level so one can see all the sources. AIPS++ should be able to do this now. For deep imaging one normally will have multiple days with the same or similar uv coverage. For this initial image, one uses just one day's data to minimize the processing time. VII. Initial Boxing Without BOXES to limit the areas CLEAN can consider, CLEAN will scatter power outside the regions with real sources to optimize its fit to the data. This will bias the noise low, bias the source flux densities low, and take longer to run than if one restricts the algorithm. It will also produce a non-optimum model for the selfcal process. There are other ways to solve this problem besides BOXES such as rejecting isolated clean components by some method before restoring the image but BOXES seems to me the most straight-forward, even if it is expensive in real time for the user. In theory, one could write program to find the boxes. However, in practice such a task has trouble distinguishing between residual sidelobes from strong sources and real sources which the human can do fairly easily. For big surveys, some such automatic program is needed but I suspect we may always need a user check and adjustment of the program's solution. Thus some sort of interactive environment is needed. In AIPS right now the process is entirely interactive. One displays the image, or part of it, on the TV and uses the graphics overlays to mark where the BOXES should go. The interactive program, FILEBOX, writes the marked positions in the same ASCII file as it stores the field centers and sizes. This makes editing this file very flexible. This process is very time consuming, however. In the ideal world there would be a program which sets up the first guess at the BOXES (or more general REGIONS). Kumar already has produced a first attempt at this. Then the user would display and correct interactively the automatically created REGIONS. The most obvious thing missing from AIPS++ is the graphical overlay for the TV. This will come up again and again in the image analysis steps. VII. Second Imaging After the BOXES, one reruns IMAGR, staring from scratch with the clean components and clean down to the current noise level. VIII. Selfcal loop One then selfcals the first days uv data using all the clean components and re-images the field. Usually, one does this the first time with a phase-only selfcal, re-images, and then does it again with a phase and amplitude selfcal and re-images again. After each re-imaging one checks the clean boxes, usually adding quite a few after the first selfcal. The amplitude and phase selfcal is usually done with a longer solution interval to increase the S/N since one is solving for more variables the second time. Besides fixing poor calibration the amplitude and phase step also forces the two IF's to be on approximately the same flux scale, which minimizes error for bright sources. This is a practical matter right now. In an era with the new VLA correlator this probably would be done differently to keep track of the differences in the primary beam with frequency and probably to solve for the spectral index as well as the average total intensity across the band. Usually two selfcal passes are adequate, so this is not really in the DIFFMAP limit. IX. Further uv Clipping Once one has a good model one can subtract out the sources and do a uv editing step on the uv residuals. Normally, I subtract out the model with UVSUB, use UVPLT to plot the data, and CLIP to remove remaining high points. Sometimes a pass through TVFLG is used here. Then one adds the MODEL make into the remaining data with UVSUB again. Exactly how this step fits with VIII can vary. X. Calibration of the Rest of the Data. Once one has a good image for the first night, one can use this model to calibrate the rest of the nights. Often just one amplitude and phase "selfcal" will be good enough. Also one needs to go through step IX on the rest of the data at this point. XI. Compressing the Full uv Dataset Once again one wants to average the data in time and frequency before imaging. However, in addition, one wants to average the multiple days together to minimize the size of this full dataset. In AIPS right now this involves going through a series of steps not intended for this purpose. One converts the times into HA's, edits the headers to make the individual datasets look like they all were observed on the same day, DBCONs all the datasets together, sorts them in baseline-time order, averages them optimally versus time, and sorts then back in time-baseline order for IMAGR. I probably am leaving a few steps out. Clearly a TASK to do all this would be a better approach. However, this effort and bookkeeping is well worth it because one can reduce a big database by a factor of 10 or more and thus speed up the imaging by almost the same factor. XII. Imaging the Full Dataset At this point one normally makes an image with the full dataset, adds and resets the boxes again, does a selfcal using the optimally averaged dataset, and then makes an "almost" final image. XIII. RR/LL Pointing Errors. At this point the residual pointing errors usually limit the result. The beam squint produces pointing differences between the R and L feeds which are understood to first order. For sources near the field center this is not too big a problem; however, for bright sources further for the pointing center the effect can be large and can be the big problem, scattering sidelobes over large regions of the map and increasing significantly the average noise level. This can be helped by a good choice of weighting at step IV. The pointing error also varies with paralactic angle as the position of a bright source relative to the pointing error vector changes. The only way in current AIPS to deal with this problem is to divide the data into small ranges of paralactic angle, by polarization and by IF. Then one images and selfcals each subset separately, forcing the same clean beam for each one. Then one measures the noise on each resulting image and stacks them together weighting them optimally. Usually if I have 6 hours for each uv-track (only six hours to minimize the system temperature increases at low elevation), I divide the data into two time intervals, two IFs and two polarizations, producing 8 images in the end I must stack up. Eventually, in AIPS++ one hopes this problem will be reduced by smarter selfcal/imaging programs which can 1) take into account the RR/LL pointing offset from the start, 2) solve for pointing errors with time and include them in the processing, 3) solve for local pointing errors for bright confusing sources and treat them separately in the processing. However, this processing may complicate the data compression issues and require a rethinking of the sequence of events in the processing. XIV. Making the Source Catalogs After the images are made there is still a lot to do to produce usable scientific results. The first, most important, step is to make a source catalog. To do this one first runs an automatic source finding/fitting task. In AIPS this is SAD. After adjusting a number of parameters and running the program on the first facet of the image, one produces an ASCII catalog. However, the same problems exist with this task as with the automatic boxing procedure. When one is in a region affected by the residual sidelobes of a bright source, the program has trouble tell the difference between sidelobe ripple and real sources. Thus one is required to investigate each source to be sure it is real. I do this by using the ASCII list, the mouse and an almost trivial AIPS script called COVTLOD. COTVLOD allows me to specify a center position and an image size and then load a region of that size on the TV centered on the position specified. While it would be fairly easy for each user to write such a script, it is very efficient for one to be available in the package. I similar script, COSTAR, will also plot a symbol on the TV using the graphics overlay. I often plot a circle centered on the position to make recognizing the source easy. At this point I usually re-fit the flux and position of the source with JMFIT and copy the answer with the mouse to my summary table. This last step is probably unnecessary and is partly driven by a less than ideal format for the SAD output. For large sources, I usually measure a total flux density with TVSTAT, make a contour/greyscale image with KNTR, and measure a total size from two extreme positions on the image. If there is a central component or other bright feature, I measure its position using JMFIT, TVSTAT, or in extreme cases with the IMPOS, which just reads the positions which the cursor is pointed to. AIPS++ clearly needs the source finding/fitting/cataloging task. It also needs the graphics overlay and should have a script to load a small region around a given position as well as being able to mark particular positions on the TV. XV. Comparing with other catalogs At this point one normally wants to compare the radio image with other images: radio, optical, IR, X-ray etc. In general other images will not have coordinates will not be well registered with the radio system. One needs a way to improve this. This is done in AIPS with a program called, XTRAN, which can take a set of known positions on the optical (or other type) image and calculate a transformation to a new coordinate system. One can then use the SAD process to make a catalog for this image. One can also use HGEOM to transform the image again to register with the radio image. However, normally it is easier to mark all the radio source positions on the optical (or other) image and look for matches. This is done in AIPS by making an ST file. The ST file is, in turn, made by running a task, STARS, which converts an ASCII file into an internal AIPS extension file. One can then use TVSTAR to mark the positions of a large number of objects on another image. For optical identifications, one usually uses the marks the smaller number of radio sources on the optical image. One can also plot the ST files on greyscale or contour output images. Another use is checking the final catalog one makes in step XIV. COTVLOD and COSTAR are extremely useful at this point. One often has an interesting object in a given field that someone has found something interesting about at another wavelength. Using these two scripts one can investigate the radio image at this position quickly and precisely. It also is very useful to be able to plot a contour map from one band on top of a greyscale from another. For catalogs of multiple types of objects, it is useful to make several ST files and display each one in a different color. While most of the latter capabilities are not imaging, they are necessary for a complete processing package which can produce actual scientific results without going into another package. AIPS++ must be able to deal with data from other bands to be successful. Summary: AIPS++ seems to need several improvements to make it useful for deep, widefield imaging and several more for to take over all the processing in this area. Many of these should be easy. However, the uvdatabase, and improvements to the viewer seem like the most fundamental. A few others like dealing with the pointing/gain errors are clearly difficult but may be conceptually straight-forward. Much of the rest may be doable with GLISH scripts but may take a lot of time to tune adequately to make useful. It also is possible I have left some things out and may need to revise this memo as I realize what I have forgotten. But this is the process I believe is necessary as of right now. ---Frazer