Date: 26 Jun 2011 (Updated : 10 Aug 2011)
The 'testautoflag' task in casapy-test (> 15503) contains the TFCrop autoflag algorithm, flag extensions, and an interactive data/flag display.
Within CASA, please type 'help testautoflag' for inline documentation.
The 'testautoflag' task in CASA runs an RFI-detection algorithm with options to extend and grow flags. Data and flags can be visualized at run-time and flag-summary plots can be generated at the end. The task can also be run in inspection-mode to find RFI and gather statistics (but not write flags to the MS).
Selected data is iterated though in chunks of time. Baselines, Field Ids, Spectral Windows and Array Ids are treated separately. Scans are combined to accumulate 'ntime' seconds of integrations.
Data Selection Parameters
|vis||' '||Name of Measurement Set|
|field||' ' (all)||Select data based on field id(s) or name(s)|
|spw||' ' (all)||Select data based on spectral window and channels|
|selectdata||False||Other data-selection parameters
|datacolumn||'data'||Data column on which to operate.
options : 'data', 'corrected', 'model', 'residual', 'residual_data'
Flags are based on the absolute values of the visibilities from the specified column.
|ntime||100||Timerange (in seconds) over which to accumulate data before running
the autoflag algorithms.
The dataset will be iterated through in time-chunks defined here.
|corrs||[ ] (all)||List of ones/zeros to signal which correlations to operate upon.
default :  (all correlations)
example : [1,0,0,1] to choose RR and LL in data with RR, RL, LR, LL
example : [1,0] to choose only RR from data containing RR and LL
NOTE : This syntax will change in a later version of this task, to support user-specified lists of the type 'RR, LL'.
For each chunk of time, visibilities are organized as 2D time-frequency planes, one for each baseline and correlation type, and the following steps (outlier-detection, and flag extension) are performed on each 2D plane.
STEP 1 : Calculate a bandshape template : Average the data across time, to construct an average bandpass. Construct an estimate of a clean bandpass (without RFI) via a robust piece-wise polynomial fit to the average bandpass shape.
Note : A robust fit is computed in upto 5 iterations. It begins with a straight line fit across the full range, and gradually increases to 'maxnpieces' number of pieces with third-order polynomials in each piece. At each iteration, the stddev between the data and the fit is computed, values beyond N-stddev are flagged, and the fit and stddev are re-calculated with the remaining points. This stddev calculation is adaptive, and converges to a value that reflects only the data and no RFI. At each iteration, the same relative threshold is applied to detect flags, and this results in a varying set of flagging thresholds, that allows deep flagging only when the fit represents the true data best. Iterations stop when the stddev changes by less than 10%, or when 5 iterations are completed.
The resulting clean bandpass is a fit across the base of RFI spikes.
STEP 2: Use this clean bandpass to find RFI on the 2D time-frequency plane.
Flagging is also done in upto 5 iterations. In each iteration, for every timestep, calculate the stddev of the data spectrum w.r.to the clean fitted bandshape, flag all points further than N times stddev from the fit, and recalculate the stddev. At each iteration, the same relative threshold is applied to detect flags. Optionally, use sliding-window based statistics to calculate additional flags.
STEP 3: Repeat STEPS 1 & 2, but in the other direction (i.e. average the data across frequency, calculate a piece-wise polynomial fit to the average time-series, and find flags based on deviations w.r.to this fit.)
TFCrop Parameters (Used if 'tfcrop'=True) :
|timecutoff||4.0||Flag threshold in time (flag all data-points further than N-stddev from the fit).|
|freqcutoff||3.0||Flag threshold in frequency. Flag all data-points further than N-stddev from the fit.|
|timefit||'line'||Fitting function for the time direction
options = 'line', 'poly'
A 'line' fit is a robust straight-line fit across the entire timerange (defined by 'ntime').
A 'poly' fit is a robust piece-wise polynomial fit across the timerange. Choose 'poly' only if the visibilities are expected to vary significantly over the timerange selected by 'ntime', or if there is a lot of strong but intermittent RFI.
|freqfit||'poly'||Fitting function for the frequency direction
options = 'line','poly' (similar to 'timefit')
Choose 'line' only if you are operating on bandpass-corrected data, or residuals, and expect that the bandshape is linear. The 'poly' option works better when there are multiple lines of strong narrow-band RFI.
|maxnpieces||7||Maxinum number of pieces to allow in the piecewise-polynomial fits
options = 1 - 9
This parameter is used only if 'timefit' or 'freqfit' are chosen as 'poly'. If there is significant broad-band RFI, reduce this number (say 5). Using too many pieces could result in the RFI being fitted in the 'clean' bandpass.
In later stages of the fit, a third-order polynomial is fit per piece, so for best results, please ensure that nchan/maxnpieces is at-least 5.
|flagdimension||'freqtime'||Choose the directions along which to perform flagging
default = 'freqtime' : First flag along frequency, and then along time
options = 'time', 'freq', 'timefreq', 'freqtime'
For most cases, 'freqtime' or 'timefreq' are appropriate, and differences between these choices are apparant only if RFI in one dimension is significantly stronger than the other. The goal is to flag the dominant RFI first.
If there are very few (less than 5) channels of data, then choose 'time'. Similarly for 'freq'.
|usewindowstats||'none'||Use sliding-window statistics to find additional flags ( This is Experimental !! )
options = 'none', 'sum', 'std', 'both'
The 'sum' option chooses to flag a point, if the mean-value in a window centered on that point deviates from the fit by more than N-stddev/1.5. This option is an attempt to catch broad-band or time-persistent RFI that the above polynomial fits will mistakenly fit as the clean band. It is an approximation to the sumThreshold method found to be effective by Offringa et.al (2010) for LOFAR data.
The 'std' option chooses to flag a point, if the 'local' stddev calculated in a window centered on that point is larger than N-stddev/1.5. This option is an attempt to catch noisy RFI that is not excluded in the polynomial fits, and which increases the global stddev, and results in fewer flags (based on the N-stddev threshold). This is an approximation to the idea behind 'rflag' in AIPS (which E.Greisen is currently refining).
|halfwin||1||Half width of sliding window to use with 'usewindowstats'. (This is Experimental !!)
options = 1,2,3 for 3-point, 5-point or 7-point window sizes
Extend/grow flags that have been detected until now (old and new).
Flag extension Parameters (Used if 'extendflags'=True) :
|extendpols||False||Extend flags to all correlations
This option can be used in conjunction with 'corrs' to calculate flags using only parallel-hand data, but apply them to all correlations (for example)
|growtime||50.0||For any channel, flag the entire timerange in the current 2D chunk (set by 'ntime')
if more than X% of the timerange is already flagged.
options = 0.0 - 100.0
This option catches the low-intensity parts of time-persistent RFI.
|growfreq||50.0||For any timestep, flag all channels in the current 2D chunk (set by data-selection)
if more than X% of the channels are already flagged.
options = 0.0 - 100.0
This option catches broad-band RFI that is partially identified by earlier steps.
|growaround||True||Extend flags to immediately surrounding points in the time-freq plane.
For every un-flagged point on the 2D time/freq plane, if more than four surrounding points are already flagged, flag that point. This option catches some wings of strong RFI spikes.
|flagneartime||False||Flag points before and after every flagged one, in the time-direction.
Note : This can result in excessive flagging.
|flagnearfreq||False||Flag points before and after every flagged one, in the frequency-direction
This option allows flagging of wings in the spectral response of strong RFI. Note : This can result in excessive flagging.
Visualization of the data and flags at run-time is possible by setting 'datadisplay'=True. The intended usage is to run testautoflag with datadisplay=True and writeflags=False on a small sub-selection of the data, and change parameters until the desired flagging results are obtained. Then, turn off the display, set writeflags=True, and run it again.
A flag summary is generated at the end of the run, to list the percentage of data flagged as a function of frequency channel, spw, field, etc. If the task is run by turning off the tfcrop and extendflag options, it will compute statistics for all existing flags in the MS.
|datadisplay||False||Display data and flags at run-time, within an interactive GUI
This option opens a GUI to show the 2D time-freq planes of the data with old and new flags, for all correlations per baseline.
- The GUI allows stepping through all baselines (prev/next) in the current chunk (set by 'ntime'), and stepping to the next-chunk.
- The testautoflag task can be quit from the GUI, in case it becomes obvious that the current set of parameters is just wrong.
- There is an option to stop the display but continue flagging.
|plotsummary||False||Parse flag counts, and display a spectrum of percentage-of-flagged-data
Flag percentages are shown separately for different fields and spws, and counts are combined for all selected time-ranges, baselines, and correlations.
Note : If some baselines are completely flagged (or some correlations have exact zeros and are flagged), the floor of the spectrum will rise.
|usepreflags||True||Choose whether or not to use/honour existing flags in the MS. If 'writeflags'=True, old flags in the MS will be overwritten (but a backup can be taken)|
|preflagzeros||True||Choose whether or not to preflag visibilities exactly equal to 0.0
Note : If set to True, an elevated floor-level in the 'plotsummary' rfi-spectrum will indicate the presence of exact-zeros in some baselines or correlations.
|writeflags||False||Choose whether or not to write flags to the MS
The testautoflag task can be run in 'writeflags=False' mode just to inspect the data, or to try different parameters and converge on a set that works for a particular dataset.
A set of examples are discussed in these slides.
For each example, screen-shots of the data-display GUI are shown with default parameters, followed by successive changes to these parameters to achieve desired flagging results. A process of trial and error is currently required at-least once per spw. Based on tests with 5 EVLA datasets, it can be said that parameters do not have to be varied across antennas or baselines, but each SPW needs to be tuned separately.
The original implementation of this algorithm is described in NCRA Technical Report 202 (October 2003).
TFCrop Version 1 documentation (Feb 2011) contains information, examples and a to-do list from the first implementation in CASA.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
The translation was initiated by R. V. Urvashi on 2011-08-10