4.4 Assessing the data quality and initial editing

At each stage in the data calibration process, it is a good idea to take a look at the data to determine their quality and then to “flag” (edit, delete) those that are suspect or clearly bad. Having begun the actual calibration, it is important to get an impression of the overall quality of the data and to edit out any obviously corrupted data, (e.g., bad integrations that were not detected and expunged by the on-line monitoring system, high amplitudes due to interference, unstable amplitudes due to undetected equipment problems, etc). During the initial calibration, you need to do this only on the observations of calibration sources. However, at a later stage, you may also need to apply techniques similar to those described below to your program sources. If you do edit any calibration data at this point, you must re-run CALIB following the instructions given above for the affected sources.

The philosophy of editing and the choice of methods are matters of personal taste and the advice given below should, therefore, be taken with a few grains of salt. When interferometers consisted of only a couple of movable antennas, there was very little data and it was sparsely sampled. At that time, careful editing to delete all suspect samples, but to preserve all samples which can be calibrated, was probably justified. But modern instruments produce a flood of data, with the substantial redundancy that allows for self-calibration on strong sources. Devoting the same care today to editing is therefore very expensive in your time, while the loss of data needlessly flagged is rarely significant. A couple of guidelines you might consider are:

There are three general methods of editing in AIPS. The “old-fashioned” route uses LISTR to print listings of the data on the printer or the user’s terminal. The user scans these listings with his eyes and, upon finding a bad point, enters a specific flag command for the data set using UVFLG. While this may sound clumsy, it is in fact quite simple and by far the faster method when there are only a few problems. In a highly corrupted data set, it can use a lot of paper and may force you to run LISTR multiple times to pin down the exact problems. The “hands-off” route uses tasks which attempt to determine which data are bad using only modest guidance from the user. The most general of these is FLAGR mentioned below. The third and “modern” route uses interactive (“TV”-based) tasks to display the data in a variety of ways and to allow you to delete sections of bad data simply by pointing at them with the TV cursor. These tasks are TVFLG (§4.4.3) for all baselines and times (but only shows one IF, one Stokes, and one spectral channel at a time), SPFLG (§10.2.2) for all spectral channels, IFs, and times (but only shows one baseline and one Stokes at a time), EDITA (§4.4.2) for editing based on TY (Tant), SN or CL table values, EDITR (§5.5.2) for all times (but only shows a single antenna (1–11 baselines) and one channel average at a time) and WIPER for all types of data (but with the antenna, time, and even polarization of the points not available while editing). TVFLG is the one used for continuum and channel-0 data from the VLA, while SPFLG is only used to check for channel-dependent interference. SPFLG is useful for spectral-line editing in smaller arrays, such as the Australia Telescope and the VLBA. (The redundancy in the spectral domain on calibrator sources helps the eyes to locate bad data.) EDITR is more useful for small arrays such as those common in VLBI experiments. EDITA has been found to be remarkably effective using VLA system temperature tables. All four tasks have the advantage of being very specific in displaying the bad data. Multiple executions should not be required. However, they may require you to look at each IF, Stokes, channel (or baseline) separately (unless you make certain broad assumptions); EDITA and EDITR do allow you to look at all polarizations and/or IFs at once if you want. They all require you to develop special skills since they offer so many options and operations with the TV cursor (mouse these days). A couple of general statements can be made

4.4.1 Editing with LISTR and UVFLG

Data may be flagged using task UVFLG based on listings from LISTR. To print out the scalar-averaged raw amplitude data for the calibrators, and their rms values, once per scan in a matrix format, the following inputs are suggested:

> TASK LISTR’ ; INP  C R

to review the inputs needed.

> INDI n; GETN m  C R

to select the data set, n = 3 and m = 1 above.

> SOURCES ’ ’ ; CALCODE ’*’  C R

to select calibrators.

> TIMER 0  C R

to select all times.

> ANTENNAS 0  C R

to list data for all antennas.

> OPTYPE ’MATX’  C R

to select matrix listing format.

> DOCRT FALSE  C R

to route the output to printer, not terminal.

> DPARM 3 , 1 , 0  C R

amplitude and rms, scalar scan averaging.

> BIF 1; EIF 0  C R

to select all IFs, LISTR will list IFs separately.

> FREQID 1  C R

to select FQ number 1 (note that FQ numbers must also be done separately).

> INP  C R

to review the inputs.

> GO  C R

to run the program when inputs set correctly.

For unresolved calibrators, the VLA on-line gain settings normally produce roughly the same values in all rows and columns within each matrix. At L, C, X, and U bands, these values should be approximately 0.1 of the expected source flux densities. At P band, the factor is about 0.01. The factors for other bands are unspecified. Any rows or columns with consistently high or low values in either the amplitude or the rms matrices should be noted, as they probably indicate flaky antennas. In particular, you should look for

The next step is to locate the bad data more precisely. Suppose that you have found a bad row for antenna 3 in right circular polarization in IF 2 between times (d1, h1, m1, s1) and (d2, h2, m2, s2). You might then rerun LISTR with the following new inputs:

> SOURCES ’ ’  C R

to select all sources.

> TIMER d1 h1 m1 s1 d2 h2 m2 s2  C R

to select by time range.

> ANTENNAS 1 , 2 , 3  C R

to list data for antenna 3 with two “control” antennas.

> BASEL 1 , 2 , 3  C R

to list all baselines with these three antennas.

> OPTYPE LIST  C R

to select column listing format.

> DOCRT 1  C R

to route the output to terminal at its width.

> DPARM = 0  C R

amplitude only, no averaging.

> STOKES ’RR’  C R

to select right circular.

> BIF 2  C R

to specify the “BD” IFs.

> FLAGVER 1  C R

to choose flag table 1.

> GO  C R

to run the program.

This produces a column listing on your terminal of the amplitude for baselines 1–2, 1–3 and 2–3 at every time stamp between the specified start and stop times. The ‘1–2” column provides a control for comparison with the two columns containing the suspicious antenna.

Note that “amp-scalar” averaging ignores phase entirely and is therefore not useful on weak sources, nor can it find jumps or other problems with the phases. To examine the data in a phase-sensitive way, repeat the above process, but set DPARM(2) = 0 rather than 1. Bad phases will show up as reduced amplitudes and increased rms’s.

Once bad data have been identified, they can be expunged using UVFLG. For example, if antenna 3 RR was bad for the full interval shown above, it could be deleted with

> TASK UVFLG’ ; INP  C R

to select the editor and check its inputs.

> TIMER d1 h1 m1 s1 d2 h2 m2 s2  C R

to select by time range.

> BIF 2 ; EIF = BIF  C R

to specify the “BD” IFs.

> BCHAN 0 ; ECHAN 0  C R

to flag all channels.

> FREQID 1  C R

to flag only the present FQ number.

> ANTEN 3 , 0  C R

to select antenna 3.

> BASEL 0  C R

to select all baselines to antenna 3.

> STOKES ’RR’  C R

to select only the RR Stokes (LL was found to be okay in this example).

> REASON = ’BAD RMS WHOLE SCAN  C R

to set a reason.

> OUTFGVER 1  C R

to select the first (only) flag table.

> INP  C R

be careful with the inputs here!

> GO  C R

to run the task when ready.

Continue the process until you have looked at all parts of the data set that seemed anomalous in the first matrix listing, then rerun that listing to be sure that the flagging has cleaned up the data set sufficiently. If there are lots of bad data, you may find that you have missed a few on the first pass. If you change your mind about a flagging entry, you can use UVFLG with OPCODE = ’UFLG’ to remove entries from the flag table. All adverbs of UVFLG are used when removing entries, so you may use REASON along with the channel, IF, source, et al. adverbs to select the entries to be removed. If the table becomes hopelessly messed up, use EXTDEST to delete the flag table and start over or use a higher numbered flag table. The contents of the flag table may be examined at any time with the general task PRTAB and entries in it may also be removed with TABED and/or TAFLG. Two flag tables can be merged using TAPPE.

4.4.2 Editing with EDITA

The task EDITA uses the graphics planes on the AIPS TV display to plot data from tables and to offer options for editing (deleting, flagging) the associated uv data. At this time, only the TY (system temperature), SN (solution), and CL (calibration) tables may be used. We recommend using EDITA with the TY tables to do the initial editing of VLA data sets, probably before running the programs described in §4.3. For accuracy in evaluating and flagging your data, it is a good idea to have the TY table filled with the same interval as the data themselves; see §4.1.1. Try:

> TASK EDITA ; INP  C R

to review the inputs needed.

> INDI n ; GETN m  C R

to select the data set, n = 3 and m = 1 above.

> INEXT ’TY’  C R

to use the system temperature table.

> INVERS 0  C R

to use the highest numbered table, usually 1.

> TIMER 0  C R

to select all times.

> FREQID 3  C R

Select FQ entry 3.

> BIF 1 ; EIF 0  C R

to specify all IFs; you can then toggle between them interactively and even display all at once.

> ANTENNAS 0  C R

to display data for all antennas.

> ANTUSE 1 , 2 , 3 , 4 , 5 , 6 , 7  C R

to display initially the first 7 antennę, editing antenna 1. Others may be selected interactively.

> FLAGVER 1  C R

to use flag (FG) table 1.

> OUTFGVER 0  C R

to create a new flag table with the flags from FG table 1 plus the new flags.

> SOLINT 0  C R

to avoid averaging any samples.

> DOHIST FALSE  C R

to omit recording the flagging in the history file.

> DOTWO TRUE  C R

to view a 2nd observable for comparison

> CROWDED TRUE  C R

to allow plots with all polarizations and/or IFs simultaneously.

> INP  C R

to review the inputs.

> GO  C R

to run the program when inputs set correctly.

If you make multiple runs of EDITA, it is important to make sure that the flagging table entries are all in one version of the FG table. The easiest way to ensure this is to should set FLAGVER and OUTFGVER to 0 and keep it that way for all runs of EDITA. This may create an excessive number of flag tables, but unwanted ones may be deleted with EXTDEST. If you make a mistake two flag tables may be merged with the task TAPPE. A sample display from EDITA is shown on the next page.

The following discussion assumes that you have read §2.3.2 and are familiar with using the AIPS TV display. An item in a menu such as that shown in the figure is selected by moving the TV cursor to the item (holding down or pressing the left mouse button). At this point, the menu item will change color. To obtain information about the item, press AIPS TV “button D” (usually the D key and also the F6 key on your keyboard). To tell the program to execute the menu item, press any of AIPS TV buttons A, B, or C. Status lines around the display indicate what is plotted and which data will be flagged by the next flagging command. In the figure below, only the displayed antenna (2), and time range will be flagged. You must display at least a few lines of the message window and your main AIPS window since the former will be used for instructions and reports and the latter will be needed for data entry (e.g., antenna selection).


PIC

Figure 4.2: A display of a sample TV screen from EDITA, made using the AIPS task TVCPS to produce a negative black-and-white display. System temperatures are being used to edit VLBA data. The EDITA menu (in the boxes), the status lines (at the bottom), the editing area (bottom) of a portion of the data from the selected antenna (1), the subsidiary plots of data from selected secondary antennę (3, 5, 7, 9), the edit tool (bar or box), and the edit location values are displayed in different graphics planes which normally appear in different colors. In this example, with CROWDED=TRUE, four IFs but only one polarization are displayed and may be edited simultaneously. Both polarizations can be displayed together along with either one or all four IFs.


The first thing to do with EDITA is to look at all of the polarizations, IFs, and antennę, in order to flag the obviously bad samples (if any). Use SWITCH POLARIZATION to switch between polarizations and ENTER IF to select the IF to edit. Alternatively, NEXT CORRELATOR will cycle through all polarizations and IFs. If CROWDED was set to true, SWITCH POLARIZATION will cycle through displaying both polarizations as well as each separately, and ENTER IF will accept 0 as indicating all. NEXT CORRELATOR shows only one correlator at a time, but can switch away from a multi-correlator display. These options appear only if there is more than one polarization and/or more than one IF in the loaded data. Use ENTER ANTENNA to select the antenna to be flagged and ENTER OTHER ANT to select secondary antennę to be displayed around the editing area. If the secondary antennę have no obvious problems, then they do not have to be selected for editing. EDITA will plot all of the times in the available area, potentially making a very crowded display. You may select interactively a smaller time range or “frame” in order to see the samples more clearly. It is necessary to select each frame in order to edit the data in that frame so it helps to make the TV screen as big as possible with the F2 button or your window manager. Note that the vertical scales used by EDITA are linear, but that the horizontal scale is irregular and potentially discontinuous. Integer hours are indicated by tick marks and the time range of the frame is indicated. Use FLAG TIME or FLAG TIME RANGE to delete data following instructions which will appear on the message window. While you are editing, the source name, sample time and sample value currently selected will be displayed in the upper left corner of the TV screen. This information can also be used to determine if QUACK is needed.

Having flagged all obviously bad points, select SWITCH ALL IF, SWITCH ALL TIME, SWITCH ALL ANT, and SWITCH ALL POL so that the next flag command(s) apply to all of the data. (Decide whether the flags should apply only to the source(s) displayed or to all sources and set SWITCH ALL SOURC appropriately.) Set the SCAN LENGTH long enough to include the shorter of the full scan and about 12 samples. Then display the difference between the current sample and the running mean by selecting SHOW TSYS - <T>. Use FLAG ABOVE and FLAG BELOW to flag all samples more than a few sigma away from the local mean. Finally, apply your flagging to your uv data set by selecting EXIT.

At this point, return to §4.3.1 to run QUACK followed by the first pass of the gain calibration. Then run TVFLG below with DOCAL TRUE so that the data will be displayed on the same flux scale for all baselines.

4.4.3 Editing with TVFLG

If your data are seriously corrupted, contain numerous baselines, and you like video games, TVFLG is the visibility editor of choice. The following discussion assumes that you have read §2.3.2 and are familiar with using the AIPS TV display. The following inputs are suggested:

> TASK TVFLG’ ; INP  C R

to review the inputs needed.

> INDI n ; GETN m  C R

to select the data set, n = 3 and m = 1 above.

> SOURCES ’ ’  C R

to select all sources.

> TIMER 0  C R

to select all times.

> STOKES ’RRLL’  C R

to select both right and left circular polarizations; you can then toggle between RR and LL interactively.

> FREQID 3  C R

Select FQ entry 3.

> BIF 1 ; EIF 2  C R

to specify both VLA IFs; you can then toggle between the two interactively.

> ANTENNAS 0  C R

to display data for all antennas.

> BASELINE 0  C R

to display data for all baselines.

> DOCALIB 1  C R

to apply initial calibration to the data.

> FLAGVER 1  C R

to use flag (FG) table 1.

> OUTFGVER 0  C R

to create a new flag table with the flags from FG table 1 plus the new flags.

> DPARM = 0  C R

to use default initial displays and normal baseline ordering.

> DPARM(6) = 30  C R

to declare that the input data are 30-second averages, or to have the data averaged to 30 seconds.

> DPARM(5) = 10  C R

to expand the flagging time ranges by 10 seconds in each direction. The times in the master grid are average times and may not encompass the times of the samples entering the average without this expansion.

> DOCAT 1  C R

to save the master grid file.

> INP  C R

to review the inputs.

> GO  C R

to run the program when inputs set correctly.

If you make multiple runs of TVFLG, it is important to make sure that the flagging table entries are all in one version of the FG table. The easiest way to ensure this is to set FLAGVER and OUTFGVER to 0 and keep it that way for all runs of TVFLG. If you make a mistake, two flag tables may be merged with the task TAPPE.

TVFLG begins by constructing a “master grid” file of all included data. This can be a long process if you include lots of data at once. It is probably better to use the channel selection, IF selection, source selection, and time range selection adverbs to build rather smaller master grid files and then to run TVFLG multiple times. It will work with all data included, allowing you to select interactively which data to edit at any one moment and allowing you to resume the editing as often as you like. But certain operations (such as undoing flags) have to read and process the entire grid, and will be slow if that grid is large. The master grid file is always cataloged (on IN2DISK with class TVFLGR), but is saved at the end of your session only if you set DOCAT = 1 (actually > 0) before starting the task. To resume TVFLG with a pre-existing master grid file, set the adverb IN2SEQ (and IN2DISK) to point at it. When resuming in this way, TVFLG ignores all of its data selection adverbs since they might result in a different master grid than the one it is going to use. If you wish to change any of the data selection parameters, e.g., channels, IFs, sources, times, or time averaging, then you must use a new master grid.

Kept with the master grid file is a special file of TVFLG flagging commands. This file is updated as soon as you enter a new flagging command, making the master grid and your long editing time virtually proof from power failures and other abrupt program terminations. These flagging commands are not entered into your actual uv data set’s flagging (FG) table until you exit from TVFLG and tell it to do so. During editing, TVFLG does not delete data from its master grid; it just marks the flagged data so that they will not be displayed. This allows you to undo editing as needed during your TVFLG session(s). When the flags are transferred to the main uv data set, however, the flagged data in the master grid are fully deleted since undoing the flags at that point has no further meaning. When you are done with a master grid file, be sure to delete it (with ZAP) since it is likely to occupy a significant amount of disk.

TVFLG keeps track of the source name associated with each row of data. When averaging to build the master grid and to build the displayed grids, TVFLG will not average data from different sources and will inform you that it has omitted data if it has had to do so for this reason. For multi-source files, the source name is displayed during the CURVALUE-like sections. However, the flagging table is prepared to flag all sources for the specified antennas, times, etc. or just the displayed source. If you are flagging two calibrator scans, you may wish to do all source in between as well. Use the SWITCH SOURCE FLAG interactive option to make your selection before you create flagging commands. Similarly, you will need to decide whether flagging commands that you are about to prepare apply only to the displayed channel and/or IF, or to all possible channels and/or IFs. In particular, spectral-line observers often use TVFLG on the pseudo-continuum “channel-0” data set, but want the resulting flags to apply to all spectral channels when copied to the spectral-line data set. They should be careful to select all channels before generating any flagging commands. Each flagging command generated is applied to a list of Stokes parameters, which does not have to include the Stokes currently being displayed. When you begin TVFLG and whenever you switch displayed Stokes, you should use the ENTER STOKES FLAG option to select which Stokes are to be flagged by subsequent flagging commands.

If you get some of this wrong, you can use the UNDO FLAGS option in TVFLG if the flags have not yet been applied to the uv data set. Or you can use tasks UVFLG, TABED or TAFLG to correct errors written into the FG table of your multi-source uv data set. Flag tables are now used with both single- and multi-source data sets.

TVFLG displays the data, for a single IF, channel, and Stokes, as a grey-scale display with time increasing up the screen and baseline number increasing to the right. Thus baselines for the VLA run from left to right as 1–1, 1–2, 1–3, , 2–2, 2–3, , 27–27, 27–28, and 28–28. An input parameter (DPARM(3) = 1 allows you to create a master grid and display baselines both as, say 1–2 and 2–1. An interactive (switchable) option allows you to order the baselines from shortest to longest (ignoring projection effects) along the horizontal axis.

The interactive session is driven by a menu which is displayed on a graphics overlay of the TV display. An example of this full display is shown on the next page. Move the cursor to the desired operation (noting that the currently selected one is highlighted in a different color on many TVs) and press button A, B, or C to select the desired operation; pressing button D produces on-line help for the selected operation. The first (left-most column) of choices is:

OFFZOOM

turn off any zoom magnification

OFFTRANS

turn off any black & white enhancement

OFFCOLOR

turn off any pseudo-coloring

TVFIDDLE

interactive zoom, black & and white enhancement, and pseudo-color contours as in AIPS

TVTRANSF

black & white enhancement as in AIPS

TVPSEUDO

many pseudo-colorings as in AIPS

DO WEDGE ?

switches choice of displaying a step wedge

LIST FLAGS

list selected range of flag commands

UNDO FLAGS

remove flags by number from the FC table master grid

REDO FLAGS

re-apply all remaining flags to master grid

SET REASON

set reason to be attached to flagging commands

Note: when a flag is undone, all cells in the master grid which were first flagged by that command are restored to use. Flag commands done after the one that was undone may also, however, have applied to some of those cells. To check this and correct any improperly un-flagged pixels, use the REDO FLAGS option. This option even re-does CLIP operations! After an UNDO or REDO FLAGS operation, the TV is automatically re-loaded if needed. Note that the UNDO operation is one that reads and writes the full master grid.


PIC

Figure 4.3: A display of a sample TV screen from TVFLG, made using the AIPS task TVCPS to produce a negative black-and-white display. The TVFLG menu (in the boxes) and status lines (at the bottom) are displayed in a graphics plane which is normally colored light green. The data are grey scales in a TV memory and may be enhanced in black-and-white or pseudo-colored. The particular display chosen is the amplitude of the vector difference between the sample and a running vector