The TFCrop algorithm identifies and flags outliers on the two-dimensional time-frequency plane.
Baselines, Field Ids, Spectral Windows and Array Ids are treated separately. Scans are combined to accumulate 'ntime' integration steps. The program iterates through the selected data in chunks of time specified by 'ntime'. The channel range is specified via the msselection parameter 'spw'.
The following table describes each step of the algorithm, along with information about what type of RFI is picked out at each step, and what parameters affect its behaviour.
Step | Method | RFI Found | Parameters |
1 | For each channel, perform a robust line-fit along time, and
flag outliers against it.
(Robust fit : fit a straight line, calculate stddev, flag points further than n-sigma from the fit, fit a line to the remaining data and repeat until stddev converges). |
Short-duration RFI spikes (narrow-band and broad-band).
(This step will not pick out time-persistent RFI.) |
'ntime' : should be chosen such that short-duration spikes are
less than 20% of the chosen timerange. For example, with 1-second
integrations, num_time=50 will pick out few-second duration spikes.
'timecutoff' : controls the multiple of the standard-deviation of the fit above which points will be flagged. |
2 | Calculate the time-average of the remaining data to obtain an average bandpass.
Construct an estimate of the clean bandpass (without RFI) by performing a robust piece-wise polynomial fit to the time-averaged bandpass. This robust fit begins with a straight line fit, and gradually increases to 'maxnpieces' number of pieces with third-order polynomials in each piece. |
Time-persistent RFI will be visible as spikes in the average spectrum.
The resulting clean bandpass is a fit across the base of these RFI spikes. (Warning : Low-level broad-band RFI may get included in the bandpass fit) |
'maxnpieces' : controls the maximum number of pieces in this
piece-wise polynomial fit. If there is low-level broad-band RFI, using
too many pieces could result in the RFI being fitted in the 'clean'
bandpass.
'spw' : Channel selection should result in at least 5 x maxnpieces channels (at-least 5 data points are required for a good third-order polynomial fit per piece). 'freqlinefit' : can be used to force a straight-line fit across frequency, instead of a piece-wise polynomial. This is to allow autoflagging on calibrated or residual visibilities. |
3 | Use this clean bandpass estimate to find RFI on the 2D time-frequency plane.
For each timestep, divide the data spectrum by the clean bandpass to normalize it to an ideal value of 1, and perform a robust flat-line fit (calculate stddev, flag points further than n-sigma, recalculate stddev, repeat until stddev converges). |
Time-persistent, narrow-band RFI will be picked out.
More short-duration RFI will also be picked out, because of the better bandpass-fit. Low-level time-persistent broad-band RFI (wider than about 20% of the bandpass) will not be picked out. |
'freqcutoff' : controls the multiple of the standard-deviation of the band-pass fit above which points will be flagged (for all timesteps). |
4 | Grow flags by checking if points around flagged points collectively cross
the threshold used for the main flagged point.
Also, if more than 50% of the timerange is flagged for any channel, flag all timesteps for that channel in the current chunk. |
Low-level wings of very strong RFI will be picked out.
Ripples along time will be flagged (instead of just the peaks of the ripples) |
'flaglevel' :
flaglevel = 0 : return only flags found in the previous steps.
flaglevel = 1 : grow flags in time and frequency
flaglevel = 2 : flag one timestep and channel before and after each point flagged with flaglevel 1 |
Data selection is done via ms-selection parameters ('field','spw','scan','baseline','timerange','feed','array','uvrange'). The data-column and correlation selection to operate on are specified by the 'expr' and 'column' parameters (see inline documentation for syntax and options). Flags are applied to all correlations involved in the 'expr' evaluation (flagging on 'ABS I' will apply flags to RR and LL). The use of pre-flags is controlled via the 'usepreflags' parameter. Flag displays are controlled by the 'showplots' parameter, and 'writeflags' controls whether flags are written to the MS or not (see inline documentation).
The intended usage is to run the flagger with showplots=True and writeflags=False on a small sub-selection of the data (for example, a few baselines per spw), and change parameters until the desired flagging results are obtained. Then, turn off the display, set writeflags=True, and run it.
Please watch out for the following :