Data Selection Paramters S. Myers, E. Fomalont, C. Brogan 2007-05-30 (updated 2007-05-31 STM) ----------------------------------------------------------------------------- "Standard" Selection Parameters field spw selectdata "Expanded Selection Parameters" (expanded by selectdata) chanrange (was "channel") timerange (was "selecttime") uvrange antenna (was "baseline", but Sanjay uses "antenna") correlation feed scan subarray (was "array") ============================================================================= field Field indices or names to select. If the field is a non-negative integer, it is assumed an indice. default: '' = all examples: field = '0'; field index 0 field = '0,1,2,5'; field indices 0,1,2,5 field = '0~2'; field indices 0 to 2 inclusive field = '0~12,!8'; field indices 0 through 12 except 8. field = '3C84'; field name 3C84 field = 'VIRGO A, 3C*'; field names VIRGO A and all field names beginning with 3C field = '3C,!3C273'; field names beginning with 3C except 3C273 spw Spectral window indices to select. default spw='' all spectral windows spw can be specified as a list or a string List examples: These are lists of specific spectral windows spw = [0,1] String examples: This allows use of ranges or named windows spw = '0' # sp window 0 spw = '0~2' # sp window 0,1,2 spw = '[0, 1, 2, 6]' # a list as string spw = '0, 1, 2, 6' # another way to do a list spw = '3mmUSB, 3mmLSB' # choose by names (if available) spw = '![0,2]' # negation, all spws except 0 and 2 selectdata Select a subset of the visibility file data to plot/flag. default: False = all data (subject to field and spw) True = opens up data-selection parameters ----------------------------------------------------------------------------- SELECTDATA Parameters: chanrange Range of channels/freq/velocity associated with each spw. These correspond to the list of window in the spw parameter. default spw=''; chanrange='80%'; all spw's and skip 10% at each spw edge (for spw's with 10 or more channels) chanrange can be specified as a list (of lists) or a string List examples: These are lists of specific channels (per spw) spw = ''; chanrange=[0,1,2,3] all windows, chans 0,1,2,3 spw = [0,2]; chanrange=[[0,1],[2,3,4]] sp window 0 chans 0,1 and window 2 chans 2,3,4 String examples: This allows use of ranges and steps spw = '0'; chanrange= '' sp window 0 and all channels spw = '2'; chanrange= '10~50'; sp window 2 and channels 10 to 50 incl. spw = '0~2'; chanrange='5~61' sp window 0,1,2, each with channels 5 to 61 incl. spw= '0, 0, 1, 2, 6' chanrange='4~5, 9~14, 0~63, 14~19, 10~50' sp window 0 with channels 4,5 and 9,10,11,12,13,14 sp windows 1, 2 and 6 with specified channels chanrange='10~' chans starting at 10 and going to the end Note - spws can be repeated for ANDing chanranges range may be given also in frequency or velocity spw = '0, 0, 2' chanrange = '23km/s~19km/s, 17km/s~14km/s, 2~50' sp window 0 from 14 to 17 and 19 to 23 km/s sp window 2, channels 2 to 50, incl a list of channels can also be given spw = '0, 2'; chanrange = '[0,1], [2,3,4]' sp window 0 chans 0,1 and window 2 chans 2,3,4 range may be given also in percentage of spw spw = '0, 2'; chanrange = '10%~95%, ' sp window 0, skip 10% of beginning chans and 5% of end sp window 2, all channels chanrange = '50%' the inner 50% (a single percentage means inner X%) a step can be included using ^ as a postfix to range chanrange = '10~100^2' chans 10,12,14,...,100 chanrange = '^4' chans 0,4,8,... a step in frequency or velocity will pick the nearest channels chanrange='100GHz~150GHz^10GHz' closest chans to 100,110,...,150GHz other useful chanrange options chanrange='![0,100]' # negation, all chans but 0 and 100 chanrange='>5' # all chans above 5 chanrange='<50km/s' # all chans with vel < 50km/s chanrange='10+100' # 100 channels starting with 10 timerange Time range to select default '' = all examples: timerange = 'YYYY/MM/DD/hh:mm:ss~YYYY/MM/DD/hh:mm:ss' The full syntax for time range timerange = 'hh:mm:ss~hh:mm:ss; The time range on the first day of the visibility data set timerange = 'YYYY/MM/DD/hh:mm:ss'; only within the integration time covered by this time timerange = '>hh:mm:ss'; times greater than this on the first day of the visibility data set timerange = 'hh:mm:ss~hh:mm:ss+13:00; 13-min time range on first day of visibility set timerange = '!hh:mm:ss~hh:mm:ss+13:00; all times except this 13-min time range on first day of visibility set uvrange Uvrange to include (default units = kilolambda) default: '' = all examples: uvrange = '0-1000'; uvrange between 0 and 1000 klambda uvrange = '<500'; uvrange less than 500 klambda uvrange = '0km,4km'; uvrange between 0 and 4 km antenna Antenna/baselines to select: default: '' = all examples: all antenna designations are NAMES, not INDICES (however, if we change all VLA and EVLA names to 'VL04' and 'EL04', we could have baselines be names or indices.) antenna ='5&6' baselines 5-6 antenna ='5&6;7&8' baseline 5-6 and 7-8 antenna ='5' all baselines with antenna 5 antenna ='5,6' all baselines with antennas 5 and 6 antenna ='!5' all baselines except those with antenna 5 antenna = '!5&6,!10!12'; all baselines except 5-6 and 10-12 correlation Correlators to select default '' = 'RRLL' or 'XXYY' Correlator designations are: 'RR', 'LL', 'RL', 'LR', 'XX', 'YY', 'XY', 'YX' Correlator combinations permitted area: 'RRLL' = both RR and LL; 'XXYY' = both XX and YY 'RLLR' = both RL and LR; 'XYYZ' = both XY and YX 'ALL' = RR,LL,RL,LR or XX, YY, XY, YZ 'I' = Vector sum of RR and/or LL or XX and/or YY FI' = Vector sum of RR and LL or XX and YY only if both are measured 'Q' = RL+iLR; or XX-YY [vector sum] 'U' = iRL+LR; or XY+YX [vector sum] 'V' = RR-LL; or i(XY-YX) [vector sum] scan Scan range to select default '' = all examples: scan = '3; scan number 3. The first scan is 0 scan = '0~8'; scan numbers 0 through 8, inclusive scan = '0,2,4,6'; scans 0,2,4,6 subarray Subarray to choose default '' = all subarrays examples subarray = '0'; first subarray subarray = '0,3'; subarray 0,3 subarray = '0~3'; subarray 0 through 3, inclusive feed Feed selection for focal plane array default '' = all feeds feed = '0'; first feed feed = '0,3'; feed 0,3 feed = '0~3'; feed 0 through 3, inclusive ============================================================================= NOTES: 1. Another option for stepping in chanrange is to use a separate chanstep parameter rather than the '^' mechanism, e.g. spw = '0, 0, 2, 6' chanrange = '2~6, 5~10, 12~15, 2~26' chanstep = '2, 1, 1, 5' This adds an extra parameter that will probably be seldom used (the main use of stepping will be for averaging, which will have its own controls for that, and for plotting, which will have the xinc parameter). It is also clearer to which channel selection the step pertains. 2. As in George's original 2003 proposal, the channel selection can also be included in the spw string ('spw:chanrange'), for example: spw = '0:0~15, 1:10~20, 2:100~200^10, 3:[0,1,3,9], 4:50%' spw = '0~3:50%, 4:10~90^10' This would preclude the use of any simple channel lists, but would make it absolutely clear which spw the channels correspond to (without having to match up elements in lists or separate strings). 3. Lists in strings, e.g. '[0,1,3,9]' should use "[]" as delimiter since that allows easy construction using str(), e.g. chanlist = [0,1,3,9] chanrange = str(chanlist) spwstring = '3:'+str(chanlist) This can be in addition to the grouping delimiters "()" (which are used in other selections), e.g. chanrange = '(0,1,3,9)' should be the same as chanrange = '[0,1,3,9]. I would NOT allow ranges in string lists since these are not parsable as python lists, e.g. chanrange = '[0,10~100,200]' But I would allow these in groupings chanrange = '(0,10~100,200)' chanrange = '(0,10~90^2)' 4. We might allow negative channels on ranges to refer with respect to the end and not the beginning, e.g. chanrange = '1~-1' # start chan 1 go to 1 channel from the end e.g. drop the two end channels of every spw This allows dropping channels easily from the edges without knowing how wide the spw is. Only for specs in channels (not freq or km/s). 5. Negation "!" is important and should be included in all selections. 6. It might be useful to specify steps (and averaging) in number rather than stride. This can use the "|" delimiter ("#" would be more natural but would need escaping in scripts I think). chanrange = '0~10|6' is the same as chanrange = '0~10^2' chanrange = '[0,2,4,6,8,10]' Note that there are round-off issues. For 'start~end|nbin' it should select channels start, end, and in between the channels closest to start+i*dbin i=1,...,nbin-1 dbin=(start-end)/(nbin-1) chanrange = '|100' # pick 100 chans from full range For freq and vel chanrange = '100GHz~200GHz|101' The nbin is always a number not a quantity (e.g. not '|10GHz'). It makes no sense to combine nbins with step. This is very optional, since you can do the same thing with stepping (though might be trickier for odd spacings). Note that this is more useful for averaging (see below). 7. Averaging: this is separate from selection, as it is an operation that occurs AFTER selection. It should be considered separately but it is useful to give some suggestions here to show how it meshes with selection, and to make the syntax as close as possible. The standard data selection should be used to select the data that goes into the averaging (usually just some ranges in a set of spw). (Note: this makes possible interactions between stepping in selection and selection in averaging, since selection comes first, but the user should just be warned). We then need to define a mapping from the selected data to a new set of averaged "bins". This might easily be done through 'spw:start~end^step^width|nbins' For example: average = '^10' # do 10 chan averages starting with chan 0 of the first spw listed in the spw selection (width=step) average = '^5^10' # oversample x2 with bins separated by 5 chans 10 chans wide The bins are set up to start with the "start" channel (so the bottom of the first average bin is the bottom of the first channel). All channels of all selected spw/chanrange whose center frequencies fall in the range of the bin are included in the average. Bins continue until no more data is available in any spw, or within the range if given. This outputs effectively a single averaged window. The start channel is assumed to refer to that channel in the first spw specified in the spw selection. The frequencies for the bins are referenced to this spw (e.g. in sign as channels are increased if step>0), as is the size of the step (using the channel width in this reference spw). In average without upper ends of ranges, go until no more data is available to average in any spw. Note if the end of a range is beyond the end of the reference spw, it continues as if there were additional channels there. average = '10~100^10' # averages in freq from chans 10 to 100 with width given by 10 chan widths for first spw in selection average = '10^10' # start at chan 10 first spw, continue till end of data average = '10~^10' # start at chan 10 first spw, until end of first spw If you want to average each spw separately into multiple output spws, then add spw:, e.g. average = '0:^10' # only spw 0 is included in 10 chan average average = '0:^10, 2:^5' # separate averages for spw 0 and 2 If ranges are given in spw: these are combined into single output spw Explict freq or vel averages are even easier: average = '10km/s~90km/s^5km/s' # bins 10-15,15-20,...,85-90 average = '10km/s^5km/s' # bins 10-15,15-20,... to end average = '1GHz^100MHz' # bins 1.0-1.1,1.1-1.2,... average = '2GHz^-100MHz' # bins 2.0-1.9,1.9-1.8,... Will need to decide about boundaries, e.g. start~end => f_start <= f < f_end I would tend be bin by center freq of a channel falling in that bin, but might also include any channels that overlap more than 50% (as is done now in clean). For the nbin option (which precludes stepping), default reference is not to first channel of first spw but to whole range of data selected: average = '|100' # 100 bins over entire range of data selected average = '|100|50' # 100 oversampled bins of width 1/50 of range average = '|100|200' # 100 undersampled bins of width 1/200 of range Or you can specify chan range (referenced to first spw in list if not specified) average = '0~|100' # 100 bins over range of first spw in selection NOTE: this scheme would be used in the imaging tasks also. ============================================================================= REFERENCE DOCUMENTS: George's original msselect proposal: http://www.aoc.nrao.edu/~smyers/naug/notes/2003-11-12-msselect.txt and http://almasw.hq.eso.org/almasw/bin/view/OFFLINE/DataSelection Sanjay's recent notes: http://www.aoc.nrao.edu/~smyers/naug/notes/2007-02-19-sanjay-dataselection.txt http://www.aoc.nrao.edu/~smyers/naug/notes/2007-04-04-sanjay-examples.txt