Urvashi R.V.
Date: 26 Jun 2011 (Updated : 10 Aug 2011)
The 'testautoflag' task in casapytest (> 15503) contains the TFCrop autoflag algorithm, flag extensions, and an interactive data/flag display.
Within CASA, please type 'help testautoflag' for inline documentation.
The 'testautoflag' task in CASA runs an RFIdetection algorithm with options to extend and grow flags. Data and flags can be visualized at runtime and flagsummary plots can be generated at the end. The task can also be run in inspectionmode to find RFI and gather statistics (but not write flags to the MS).
Selected data is iterated though in chunks of time. Baselines, Field Ids, Spectral Windows and Array Ids are treated separately. Scans are combined to accumulate 'ntime' seconds of integrations.
Data Selection Parameters
vis  ' '  Name of Measurement Set  
field  ' ' (all)  Select data based on field id(s) or name(s)  
spw  ' ' (all)  Select data based on spectral window and channels  
selectdata  False  Other dataselection parameters


datacolumn  'data'  Data column on which to operate.
options : 'data', 'corrected', 'model', 'residual', 'residual_data' Flags are based on the absolute values of the visibilities from the specified column. 

ntime  100  Timerange (in seconds) over which to accumulate data before running
the autoflag algorithms.
The dataset will be iterated through in timechunks defined here. 

corrs  [ ] (all)  List of ones/zeros to signal which correlations to operate upon.
default : [] (all correlations)
example : [1,0,0,1] to choose RR and LL in data with RR, RL, LR, LL example : [1,0] to choose only RR from data containing RR and LL NOTE : This syntax will change in a later version of this task, to support userspecified lists of the type 'RR, LL'. 
For each chunk of time, visibilities are organized as 2D timefrequency planes, one for each baseline and correlation type, and the following steps (outlierdetection, and flag extension) are performed on each 2D plane.
STEP 1 : Calculate a bandshape template : Average the data across time, to construct an average bandpass. Construct an estimate of a clean bandpass (without RFI) via a robust piecewise polynomial fit to the average bandpass shape.
Note : A robust fit is computed in upto 5 iterations. It begins with a straight line fit across the full range, and gradually increases to 'maxnpieces' number of pieces with thirdorder polynomials in each piece. At each iteration, the stddev between the data and the fit is computed, values beyond Nstddev are flagged, and the fit and stddev are recalculated with the remaining points. This stddev calculation is adaptive, and converges to a value that reflects only the data and no RFI. At each iteration, the same relative threshold is applied to detect flags, and this results in a varying set of flagging thresholds, that allows deep flagging only when the fit represents the true data best. Iterations stop when the stddev changes by less than 10%, or when 5 iterations are completed.
The resulting clean bandpass is a fit across the base of RFI spikes.
STEP 2: Use this clean bandpass to find RFI on the 2D timefrequency plane.
Flagging is also done in upto 5 iterations. In each iteration, for every timestep, calculate the stddev of the data spectrum w.r.to the clean fitted bandshape, flag all points further than N times stddev from the fit, and recalculate the stddev. At each iteration, the same relative threshold is applied to detect flags. Optionally, use slidingwindow based statistics to calculate additional flags.
STEP 3: Repeat STEPS 1 & 2, but in the other direction (i.e. average the data across frequency, calculate a piecewise polynomial fit to the average timeseries, and find flags based on deviations w.r.to this fit.)
TFCrop Parameters (Used if 'tfcrop'=True) :
timecutoff  4.0  Flag threshold in time (flag all datapoints further than Nstddev from the fit). 
freqcutoff  3.0  Flag threshold in frequency. Flag all datapoints further than Nstddev from the fit. 
timefit  'line'  Fitting function for the time direction
options = 'line', 'poly' A 'line' fit is a robust straightline fit across the entire timerange (defined by 'ntime'). A 'poly' fit is a robust piecewise polynomial fit across the timerange. Choose 'poly' only if the visibilities are expected to vary significantly over the timerange selected by 'ntime', or if there is a lot of strong but intermittent RFI. 
freqfit  'poly'  Fitting function for the frequency direction
options = 'line','poly' (similar to 'timefit') Choose 'line' only if you are operating on bandpasscorrected data, or residuals, and expect that the bandshape is linear. The 'poly' option works better when there are multiple lines of strong narrowband RFI. 
maxnpieces  7  Maxinum number of pieces to allow in the piecewisepolynomial fits
options = 1  9 This parameter is used only if 'timefit' or 'freqfit' are chosen as 'poly'. If there is significant broadband RFI, reduce this number (say 5). Using too many pieces could result in the RFI being fitted in the 'clean' bandpass. In later stages of the fit, a thirdorder polynomial is fit per piece, so for best results, please ensure that nchan/maxnpieces is atleast 5. 
flagdimension  'freqtime'  Choose the directions along which to perform flagging
default = 'freqtime' : First flag along frequency, and then along time options = 'time', 'freq', 'timefreq', 'freqtime' For most cases, 'freqtime' or 'timefreq' are appropriate, and differences between these choices are apparant only if RFI in one dimension is significantly stronger than the other. The goal is to flag the dominant RFI first. If there are very few (less than 5) channels of data, then choose 'time'. Similarly for 'freq'. 
usewindowstats  'none'  Use slidingwindow statistics to find additional flags ( This is Experimental !! )
options = 'none', 'sum', 'std', 'both' The 'sum' option chooses to flag a point, if the meanvalue in a window centered on that point deviates from the fit by more than Nstddev/1.5. This option is an attempt to catch broadband or timepersistent RFI that the above polynomial fits will mistakenly fit as the clean band. It is an approximation to the sumThreshold method found to be effective by Offringa et.al (2010) for LOFAR data. The 'std' option chooses to flag a point, if the 'local' stddev calculated in a window centered on that point is larger than Nstddev/1.5. This option is an attempt to catch noisy RFI that is not excluded in the polynomial fits, and which increases the global stddev, and results in fewer flags (based on the Nstddev threshold). This is an approximation to the idea behind 'rflag' in AIPS (which E.Greisen is currently refining). 
halfwin  1  Half width of sliding window to use with 'usewindowstats'. (This is Experimental !!)
options = 1,2,3 for 3point, 5point or 7point window sizes 
Extend/grow flags that have been detected until now (old and new).
Flag extension Parameters (Used if 'extendflags'=True) :
extendpols  False  Extend flags to all correlations
This option can be used in conjunction with 'corrs' to calculate flags using only parallelhand data, but apply them to all correlations (for example) 
growtime  50.0  For any channel, flag the entire timerange in the current 2D chunk (set by 'ntime')
if more than X% of the timerange is already flagged.
options = 0.0  100.0 This option catches the lowintensity parts of timepersistent RFI. 
growfreq  50.0  For any timestep, flag all channels in the current 2D chunk (set by dataselection)
if more than X% of the channels are already flagged.
options = 0.0  100.0 This option catches broadband RFI that is partially identified by earlier steps. 
growaround  True  Extend flags to immediately surrounding points in the timefreq plane.
For every unflagged point on the 2D time/freq plane, if more than four surrounding points are already flagged, flag that point. This option catches some wings of strong RFI spikes. 
flagneartime  False  Flag points before and after every flagged one, in the timedirection.
Note : This can result in excessive flagging. 
flagnearfreq  False  Flag points before and after every flagged one, in the frequencydirection
This option allows flagging of wings in the spectral response of strong RFI. Note : This can result in excessive flagging. 
Visualization of the data and flags at runtime is possible by setting 'datadisplay'=True. The intended usage is to run testautoflag with datadisplay=True and writeflags=False on a small subselection of the data, and change parameters until the desired flagging results are obtained. Then, turn off the display, set writeflags=True, and run it again.
A flag summary is generated at the end of the run, to list the percentage of data flagged as a function of frequency channel, spw, field, etc. If the task is run by turning off the tfcrop and extendflag options, it will compute statistics for all existing flags in the MS.
datadisplay  False  Display data and flags at runtime, within an interactive GUI
This option opens a GUI to show the 2D timefreq planes of the data with old and new flags, for all correlations per baseline.  The GUI allows stepping through all baselines (prev/next) in the current chunk (set by 'ntime'), and stepping to the nextchunk.  The testautoflag task can be quit from the GUI, in case it becomes obvious that the current set of parameters is just wrong.  There is an option to stop the display but continue flagging. 

plotsummary  False  Parse flag counts, and display a spectrum of percentageofflaggeddata
Flag percentages are shown separately for different fields and spws, and counts are combined for all selected timeranges, baselines, and correlations. Note : If some baselines are completely flagged (or some correlations have exact zeros and are flagged), the floor of the spectrum will rise.

usepreflags  True  Choose whether or not to use/honour existing flags in the MS. If 'writeflags'=True, old flags in the MS will be overwritten (but a backup can be taken)  
preflagzeros  True  Choose whether or not to preflag visibilities exactly equal to 0.0
Note : If set to True, an elevated floorlevel in the 'plotsummary' rfispectrum will indicate the presence of exactzeros in some baselines or correlations. 

writeflags  False  Choose whether or not to write flags to the MS
The testautoflag task can be run in 'writeflags=False' mode just to inspect the data, or to try different parameters and converge on a set that works for a particular dataset.

A set of examples are discussed in these slides.
For each example, screenshots of the datadisplay GUI are shown with default parameters, followed by successive changes to these parameters to achieve desired flagging results. A process of trial and error is currently required atleast once per spw. Based on tests with 5 EVLA datasets, it can be said that parameters do not have to be varied across antennas or baselines, but each SPW needs to be tuned separately.
The original implementation of this algorithm is described in NCRA Technical Report 202 (October 2003).
TFCrop Version 1 documentation (Feb 2011) contains information, examples and a todo list from the first implementation in CASA.
This document was generated using the LaTeX2HTML translator Version 200221 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html TFCrop.tex
The translation was initiated by R. V. Urvashi on 20110810