next up previous contents
Next: 2 Examples Up: TFCrop algorithm for RFI Previous: Contents   Contents


1 TFCrop algorithm in CASA

The TFCrop algorithm identifies and flags outliers on the two-dimensional time-frequency plane.

Baselines, Field Ids, Spectral Windows and Array Ids are treated separately. Scans are combined to accumulate 'ntime' integration steps. The program iterates through the selected data in chunks of time specified by 'ntime'. The channel range is specified via the msselection parameter 'spw'.

1.1 Algorithm Steps

The following table describes each step of the algorithm, along with information about what type of RFI is picked out at each step, and what parameters affect its behaviour.

Step Method RFI Found Parameters
1 For each channel, perform a robust line-fit along time, and flag outliers against it.

(Robust fit : fit a straight line, calculate stddev, flag points further than n-sigma from the fit, fit a line to the remaining data and repeat until stddev converges).

Short-duration RFI spikes (narrow-band and broad-band).

(This step will not pick out time-persistent RFI.)

'ntime' : should be chosen such that short-duration spikes are less than 20% of the chosen timerange. For example, with 1-second integrations, num_time=50 will pick out few-second duration spikes.

'timecutoff' : controls the multiple of the standard-deviation of the fit above which points will be flagged.

2 Calculate the time-average of the remaining data to obtain an average bandpass.

Construct an estimate of the clean bandpass (without RFI) by performing a robust piece-wise polynomial fit to the time-averaged bandpass. This robust fit begins with a straight line fit, and gradually increases to 'maxnpieces' number of pieces with third-order polynomials in each piece.

Time-persistent RFI will be visible as spikes in the average spectrum.

The resulting clean bandpass is a fit across the base of these RFI spikes.

(Warning : Low-level broad-band RFI may get included in the bandpass fit)

'maxnpieces' : controls the maximum number of pieces in this piece-wise polynomial fit. If there is low-level broad-band RFI, using too many pieces could result in the RFI being fitted in the 'clean' bandpass.

'spw' : Channel selection should result in at least 5 x maxnpieces channels (at-least 5 data points are required for a good third-order polynomial fit per piece).

'freqlinefit' : can be used to force a straight-line fit across frequency, instead of a piece-wise polynomial. This is to allow autoflagging on calibrated or residual visibilities.

3 Use this clean bandpass estimate to find RFI on the 2D time-frequency plane.

For each timestep, divide the data spectrum by the clean bandpass to normalize it to an ideal value of 1, and perform a robust flat-line fit (calculate stddev, flag points further than n-sigma, recalculate stddev, repeat until stddev converges).

Time-persistent, narrow-band RFI will be picked out.

More short-duration RFI will also be picked out, because of the better bandpass-fit.

Low-level time-persistent broad-band RFI (wider than about 20% of the bandpass) will not be picked out.

'freqcutoff' : controls the multiple of the standard-deviation of the band-pass fit above which points will be flagged (for all timesteps).
4 Grow flags by checking if points around flagged points collectively cross the threshold used for the main flagged point.

Also, if more than 50% of the timerange is flagged for any channel, flag all timesteps for that channel in the current chunk.

Low-level wings of very strong RFI will be picked out.

Ripples along time will be flagged (instead of just the peaks of the ripples)

'flaglevel' :

flaglevel = 0 : return only flags found in the previous steps.

flaglevel = 1 : grow flags in time and frequency
(a) flag surrounding points if they collectively cross the threshold
(b) if more than 50% of a channel is flagged, flag the whole channel.

flaglevel = 2 : flag one timestep and channel before and after each point flagged with flaglevel 1

1.2 Usage : Data selection and display

Data selection is done via ms-selection parameters ('field','spw','scan','baseline','timerange','feed','array','uvrange'). The data-column and correlation selection to operate on are specified by the 'expr' and 'column' parameters (see inline documentation for syntax and options). Flags are applied to all correlations involved in the 'expr' evaluation (flagging on 'ABS I' will apply flags to RR and LL). The use of pre-flags is controlled via the 'usepreflags' parameter. Flag displays are controlled by the 'showplots' parameter, and 'writeflags' controls whether flags are written to the MS or not (see inline documentation).

The intended usage is to run the flagger with showplots=True and writeflags=False on a small sub-selection of the data (for example, a few baselines per spw), and change parameters until the desired flagging results are obtained. Then, turn off the display, set writeflags=True, and run it.

Please watch out for the following :

  1. For large datasets, please perform a sub-selection of your data, depending on which parts to visualize. Once the displays begin, it is possible to type <c><enter> to continue displaying the next chunk, <s><enter> to stop displays but continue flagging the rest of the selected data, or <q><enter> to stop and exit from the flagger.
  2. Currently, the program cannot extend flags across correlations (unless the 'expr' touches multiple correlations), or baselines.
  3. If 'expr' is a list of N exprs, then <c>, <s> and <q> will have to be done N times.
  4. The bandpass-fit (blue line) can sometimes fail. If you find a reasonable bandpass shape for which the fit fails completely, please take a screenshot of the bandpass plots and send it to me. Thanks.

1.3 To Do List

  1. Extending/Growing Flags : More control on flag-levels. Provide growflag options instead of levels :
    1. flag surrounding points if they collectively cross the threshold
    2. if more than 50% of a channel is flagged, flag the whole channel (all timesteps)
    3. if more than 50% of a timerange is flagged, flag the whole timerange (all chans)
    4. flag one timestep before and after every flagged point
    5. flag one chan before and after every flagged point.
    6. extend flags to all correlations (this is in addition to what 'expr' already implies).
    7. extend flags from baseline 'a-b' to all baselines with antenna 'a' and 'b'
    A parameter 'growflags = [1,3]' can be used to list whatever the user wants.
  2. Option of polyfit in time (i.e. allow line or polynomial fits for both time and freq).
  3. Option to flag only in freq and not time (and vice-versa) - to support single-timestep or single-channel selections.
  4. Parallelize (at-least on baselines)
  5. Parse the summary information and display in readable form (gray-scale displays + plots).

1.4 Change Log

Casapy revision 14054 : Re-wrote tfcrop flagger agent, and added ds9-display.
Casapy revision 14077 : Enabled row-flag support
Casapy revision 14153 (14 Feb 2011) : Fixed Boolean math error while reading flags from MS.
Casapy revision 14184 (18 Feb 2011) : Made pre-existing flags visible on the 'left' gray-scale ds9 plot, and added a parameter to use or ignore preflags.

next up previous contents
Next: 2 Examples Up: TFCrop algorithm for RFI Previous: Contents   Contents
R. V. Urvashi 2011-07-01