Data Selection Paramters

S. Myers, E. Fomalont, C. Brogan

2007-05-30  (updated 2007-05-31 STM)

-----------------------------------------------------------------------------

"Standard" Selection Parameters

field
spw

selectdata

"Expanded Selection Parameters" (expanded by selectdata)

 chanrange (was "channel")
 timerange (was "selecttime")
 uvrange
 antenna (was "baseline", but Sanjay uses "antenna")
 correlation
 feed
 scan
 subarray (was "array")

=============================================================================

field       Field indices or names to select.  If the field is a
              non-negative integer, it is assumed an indice.
           default: '' = all
           examples:
           field = '0'; field index 0
           field = '0,1,2,5'; field indices 0,1,2,5
           field = '0~2'; field indices 0 to 2 inclusive
           field = '0~12,!8'; field indices 0 through 12 except 8.
           field = '3C84'; field name 3C84
           field = 'VIRGO A, 3C*'; field names VIRGO A and all field
              names beginning with 3C
           field = '3C,!3C273'; field names beginning with 3C
              except 3C273

spw         Spectral window indices to select.
           default spw='' all spectral windows
           spw can be specified as a list or a string
           List examples:
               These are lists of specific spectral windows
               spw = [0,1]
           String examples:
               This allows use of ranges or named windows
               spw = '0'               # sp window 0
               spw = '0~2'             # sp window 0,1,2
               spw = '[0, 1, 2, 6]'    # a list as string
               spw = '0, 1, 2, 6'      # another way to do a list
               spw = '3mmUSB, 3mmLSB'  # choose by names (if available)
               spw = '![0,2]'          # negation, all spws except 0 and 2

selectdata  Select a subset of the visibility file data to plot/flag.
           default:  False = all data (subject to field and spw)
                     True = opens up data-selection parameters

-----------------------------------------------------------------------------
SELECTDATA Parameters:

chanrange   Range of channels/freq/velocity associated with each spw.
            These correspond to the list of window in the spw parameter.
           default spw=''; chanrange='80%'; all spw's and skip 10% at each
               spw edge (for spw's with 10 or more channels)
           chanrange can be specified as a list (of lists) or a string
           List examples:
               These are lists of specific channels (per spw)
               spw = ''; chanrange=[0,1,2,3]
                  all windows, chans 0,1,2,3
               spw = [0,2]; chanrange=[[0,1],[2,3,4]]
                  sp window 0 chans 0,1 and window 2 chans 2,3,4
           String examples:
               This allows use of ranges and steps
               spw = '0'; chanrange= ''
                  sp window 0 and all channels
               spw = '2'; chanrange= '10~50';
                  sp window 2 and channels 10 to 50 incl.
               spw = '0~2'; chanrange='5~61'
                  sp window 0,1,2, each with channels 5 to 61 incl.
               spw= '0, 0, 1, 2, 6'
               chanrange='4~5, 9~14, 0~63, 14~19, 10~50'
                  sp window 0 with channels 4,5 and 9,10,11,12,13,14
                  sp windows 1, 2 and 6 with specified channels
               chanrange='10~'
                  chans starting at 10 and going to the end
               Note - spws can be repeated for ANDing chanranges
           range may be given also in frequency or velocity
               spw = '0, 0, 2'
               chanrange = '23km/s~19km/s, 17km/s~14km/s, 2~50'
                  sp window 0 from 14 to 17 and 19 to 23 km/s
                  sp window 2, channels 2 to 50, incl
           a list of channels can also be given
               spw = '0, 2'; chanrange = '[0,1], [2,3,4]'
                  sp window 0 chans 0,1 and window 2 chans 2,3,4
           range may be given also in percentage of spw
               spw = '0, 2'; chanrange = '10%~95%, '
                  sp window 0, skip 10% of beginning chans and 5% of end
                  sp window 2, all channels
               chanrange = '50%'
                  the inner 50% (a single percentage means inner X%)
           a step can be included using ^<step> as a postfix to range
               chanrange = '10~100^2'
                  chans 10,12,14,...,100
               chanrange = '^4'
                  chans 0,4,8,...
           a step in frequency or velocity will pick the nearest channels
               chanrange='100GHz~150GHz^10GHz'
                  closest chans to 100,110,...,150GHz
           other useful chanrange options
               chanrange='![0,100]' # negation, all chans but 0 and 100
               chanrange='>5'       # all chans above 5
               chanrange='<50km/s'  # all chans with vel < 50km/s
               chanrange='10+100'   # 100 channels starting with 10


timerange   Time range to select
           default '' = all
           examples:
           timerange = 'YYYY/MM/DD/hh:mm:ss~YYYY/MM/DD/hh:mm:ss'
              The full syntax for time range
           timerange = 'hh:mm:ss~hh:mm:ss; The time range on the first
              day of the visibility data set
           timerange = 'YYYY/MM/DD/hh:mm:ss'; only within the integration
              time covered by this time
           timerange = '>hh:mm:ss'; times greater than this on the first
              day of the visibility data set
           timerange = 'hh:mm:ss~hh:mm:ss+13:00; 13-min time range
              on first day of visibility set
           timerange = '!hh:mm:ss~hh:mm:ss+13:00; all times except this
              13-min time range on first day of visibility set

uvrange     Uvrange to include (default units = kilolambda)
           default: '' = all
           examples:
           uvrange = '0-1000'; uvrange between 0 and 1000 klambda
           uvrange = '<500'; uvrange less than 500 klambda
           uvrange = '0km,4km'; uvrange between 0 and 4 km

antenna     Antenna/baselines to select:
           default: '' = all
           examples: all antenna designations are NAMES, not INDICES
              (however, if we change all VLA and EVLA names to 'VL04' and
               'EL04', we could have baselines be names or indices.)
           antenna ='5&6' baselines 5-6
           antenna ='5&6;7&8' baseline 5-6 and 7-8
           antenna ='5'  all baselines with antenna 5
           antenna ='5,6' all baselines with antennas 5 and 6
           antenna ='!5' all baselines except those with antenna 5
           antenna = '!5&6,!10!12'; all baselines except 5-6 and 10-12

correlation Correlators to select
           default '' = 'RRLL' or 'XXYY'
           Correlator designations are:  'RR', 'LL', 'RL', 'LR',
                                         'XX', 'YY', 'XY', 'YX'
           Correlator combinations permitted area:
           'RRLL' = both RR and LL; 'XXYY' = both XX and YY
                    'RLLR' = both RL and LR; 'XYYZ' = both XY and YX
                    'ALL'  = RR,LL,RL,LR or XX, YY, XY, YZ
                     'I' = Vector sum of RR and/or LL or XX and/or YY
                     FI' = Vector sum of RR and LL or XX and YY only
                                           if both are measured
                     'Q' = RL+iLR; or XX-YY    [vector sum]
                     'U' = iRL+LR; or XY+YX    [vector sum]
                     'V' = RR-LL; or i(XY-YX)  [vector sum]

scan        Scan range to select
           default '' = all
           examples:
           scan = '3; scan number 3.  The first scan is 0
           scan = '0~8'; scan numbers 0 through 8, inclusive
           scan = '0,2,4,6'; scans 0,2,4,6

subarray    Subarray to choose
           default '' = all subarrays
           examples
           subarray = '0'; first subarray
           subarray = '0,3'; subarray 0,3
           subarray = '0~3'; subarray 0 through 3, inclusive

feed        Feed selection for focal plane array
           default '' = all feeds
           feed = '0'; first feed
           feed = '0,3'; feed 0,3
           feed = '0~3'; feed 0 through 3, inclusive

=============================================================================

NOTES:

1. Another option for stepping in chanrange is to use a separate chanstep 
   parameter rather than the '^<step>' mechanism, e.g.

   spw = '0, 0, 2, 6'
   chanrange = '2~6, 5~10, 12~15, 2~26'
   chanstep = '2, 1, 1, 5'
 
   This adds an extra parameter that will probably be seldom used (the main
   use of stepping will be for averaging, which will have its own controls
   for that, and for plotting, which will have the xinc parameter).  It is
   also clearer to which channel selection the step pertains.

2. As in George's original 2003 proposal, the channel selection can also
   be included in the spw string ('spw:chanrange'), for example:

   spw = '0:0~15, 1:10~20, 2:100~200^10, 3:[0,1,3,9], 4:50%'
   spw = '0~3:50%, 4:10~90^10'

   This would preclude the use of any simple channel lists, but would
   make it absolutely clear which spw the channels correspond to (without
   having to match up elements in lists or separate strings).

3. Lists in strings, e.g. '[0,1,3,9]' should use "[]" as delimiter
   since that allows easy construction using str(), e.g.

   chanlist = [0,1,3,9]
   chanrange = str(chanlist)
   spwstring = '3:'+str(chanlist)

   This can be in addition to the grouping delimiters "()" (which are
   used in other selections), e.g. 

   chanrange = '(0,1,3,9)'

   should be the same as chanrange = '[0,1,3,9].   

   I would NOT allow ranges in string lists since these are not parsable
   as python lists, e.g.

   chanrange = '[0,10~100,200]'

   But I would allow these in groupings

   chanrange = '(0,10~100,200)'
   chanrange = '(0,10~90^2)'

4. We might allow negative channels on ranges to refer with respect to
   the end and not the beginning, e.g.

   chanrange = '1~-1'  # start chan 1 go to 1 channel from the end
                         e.g. drop the two end channels of every spw

   This allows dropping channels easily from the edges without knowing
   how wide the spw is.  

   Only for specs in channels (not freq or km/s).

5. Negation "!" is important and should be included in all selections.

6. It might be useful to specify steps (and averaging) in number rather than
   stride.  This can use the "|" delimiter ("#" would be more natural but
   would need escaping in scripts I think).

   chanrange = '0~10|6'

   is the same as

   chanrange = '0~10^2'
   chanrange = '[0,2,4,6,8,10]'

   Note that there are round-off issues.  For 'start~end|nbin' it should
   select channels start, end, and in between the channels closest to
   start+i*dbin   i=1,...,nbin-1    dbin=(start-end)/(nbin-1)  

   chanrange = '|100'  # pick 100 chans from full range

   For freq and vel

   chanrange = '100GHz~200GHz|101'

   The nbin is always a number not a quantity (e.g. not '|10GHz'). 

   It makes no sense to combine nbins with step.

   This is very optional, since you can do the same thing with stepping
   (though might be trickier for odd spacings).  Note that this is
   more useful for averaging (see below).
 
7. Averaging: this is separate from selection, as it is an operation that
   occurs AFTER selection.  It should be considered separately but it is
   useful to give some suggestions here to show how it meshes with
   selection, and to make the syntax as close as possible.

   The standard data selection should be used to select the data that
   goes into the averaging (usually just some ranges in a set of spw).
   (Note: this makes possible interactions between stepping in
   selection and selection in averaging, since selection comes first,
   but the user should just be warned).  We then need to define a
   mapping from the selected data to a new set of averaged "bins".

   This might easily be done through 'spw:start~end^step^width|nbins'

   For example:

   average = '^10'  # do 10 chan averages starting with chan 0 of the first
                      spw listed in the spw selection  (width=step)

   average = '^5^10' # oversample x2 with bins separated by 5 chans 10 chans wide

   The bins are set up to start with the "start" channel (so the bottom of the
   first average bin is the bottom of the first channel).  All channels of
   all selected spw/chanrange whose  center frequencies fall in the range of 
   the bin are included in the average.  Bins continue until no more data 
   is available in any spw, or within the range if given.  This outputs 
   effectively a single averaged window.

   The start channel is assumed to refer to that channel in the first spw
   specified in the spw selection.  The frequencies for the bins are 
   referenced to this spw (e.g. in sign as channels are increased if step>0),
   as is the size of the step (using the channel width in this reference spw).
   In average without upper ends of ranges, go until no more data is available
   to average in any spw.

   Note if the end of a range is beyond the end of the reference spw, it 
   continues as if there were additional channels there.

   average = '10~100^10' # averages in freq from chans 10 to 100 with width
                           given by 10 chan widths for first spw in selection

   average = '10^10'     # start at chan 10 first spw, continue till end of data
   average = '10~^10'    # start at chan 10 first spw, until end of first spw

   If you want to average each spw separately into multiple output spws, 
   then add spw:, e.g.

   average = '0:^10'       # only spw 0 is included in 10 chan average
   average = '0:^10, 2:^5' # separate averages for spw 0 and 2

   If ranges are given in spw: these are combined into single output spw

   Explict freq or vel averages are even easier:

   average = '10km/s~90km/s^5km/s'  # bins 10-15,15-20,...,85-90
   average = '10km/s^5km/s'         # bins 10-15,15-20,... to end
   average = '1GHz^100MHz'          # bins 1.0-1.1,1.1-1.2,...
   average = '2GHz^-100MHz'         # bins 2.0-1.9,1.9-1.8,...

   Will need to decide about boundaries, e.g.
   start~end =>  f_start <= f < f_end

   I would tend be bin by center freq of a channel falling in that bin, but
   might also include any channels that overlap more than 50% (as is done
   now in clean).

   For the nbin option (which precludes stepping), default reference is not to
   first channel of first spw but to whole range of data selected:

   average = '|100'      # 100 bins over entire range of data selected
   average = '|100|50'   # 100 oversampled bins of width 1/50 of range
   average = '|100|200'  # 100 undersampled bins of width 1/200 of range

   Or you can specify chan range (referenced to first spw in list if not specified)

   average = '0~|100'    # 100 bins over range of first spw in selection 

   NOTE: this scheme would be used in the imaging tasks also.

=============================================================================

REFERENCE DOCUMENTS:

George's original msselect proposal:
http://www.aoc.nrao.edu/~smyers/naug/notes/2003-11-12-msselect.txt
and
http://almasw.hq.eso.org/almasw/bin/view/OFFLINE/DataSelection

Sanjay's recent notes:
http://www.aoc.nrao.edu/~smyers/naug/notes/2007-02-19-sanjay-dataselection.txt

http://www.aoc.nrao.edu/~smyers/naug/notes/2007-04-04-sanjay-examples.txt