%Id $Id: xtract.doc,v 1.23 1998/08/20 03:57:47 sanjay Exp sanjay $ xtract Task to extract data from a GMRT LTA file, with optional averaging in time and selection on time, baselines, frequency channels and on scans via the object name and/or scan number. Author Sanjay Bhatnagar (sanjay@ncra.tifr.res.in) in (default=STDIN) Name of the LTA file from which the data is to be extracted. out (default=STDOUT) Name of the output file. scans (default=All scans) List of scans for which the data is to be extracted. The range of scans available is always [0,n] where n is the number of scans of the selected source (see below). object (default="." ==> All objects) Regular expression to select the scans by source name. All scans who's object name matches the given regular expression will be selected. timestamps (default=All timestamps) Start and stop record number for each selected scan, within which the data is to be extracted. If only one number is given, it is treated to be the start time stamp and the data will be extracted till the end-of-file. baselines (default=None selected) List of baselines for which data is to be extracted. This comma separated list could be made of either the index of the baseline or a regular expression describing a baseline name. A full baseline name is composed of two antenna names separated by a colon (":"). A full antenna name consists of three hyphen ("-") separated fields. First field is the antenna name, second is the side band name and the third is the polarization name. For e.g., a fully qualified name for C11, upper side band, 175 MHz. polarization channel would be: C11-USB-175 A fully qualified baseline name for C11 and C12 antennas would be (upper side band, 175 MHz. polarization channel) C11-USB-175:C12-USB-175 Any of these fields can be regular expressions. Hence to choose all baselines with C11 upper side band and any polarization, one would use the baseline name as: C11-USB-.+:.+ (Here "." matches one instance of any character and the "+" operator operates on "." zero-or-more number of times. Hence ".+" is equivalent to the "*" wild card character. This is the POSIX regular expression syntax. For more detail about POSIX regular expressions, please read the document on regex. It helps to know this syntax since certain versions of grep (and certainly egrep) use this syntax too.) If only the first field of antenna name is given, the others are taken to be the wild card characters. If second fully qualified antenna names is missing from the baseline name, it is also replaced by wild card. These selections will exclude all self correlations. Also, if a baseline has already got selected in a previous selection, it will be excluded from all later selections. To select the self correlations, one must add 'A' to be beginning of the antenna names. In such a case, the name of the second antenna is redundant and therefore not required. Examples: 1) All baselines with C11 baselines=C11 2) Self correlation of C11 baselines=C11:C11 3) Self correlation of C11 130 MHz poln. channel, any side band baselines=AC11-.+-130 4) All self correlations baselines=A. 5) All USB baselines with C11 baselines=C11-USB-.+:. or baselines=C11-USB-.+ 6) All USB baselines with C11 and baselines 10,15 and 18 baselines=C11-USB-.+,10,15,18 7) All baselines in the database baselines=. 8) All central square baselines with C02-USB-175 baselines=C02-USB-175:C.+ 9) All baselines of C02-USB-175 with only arm antennas baselines=C02-USB-175:[EWS].+ channels (default=Channel No. 100) Start, stop and increment to be used to make a list of the frequency channels for which the data is to extracted. If one number is given, only the given channel will be used. If two numbers are given, they will be treated as the start and stop channel numbers with an increment of 1 within the range. If a third number is also given, it will be treated to be the increment to be used to step through the range defined by the first two numbers. antennas (default=All antenna) List of antenna numbers or names to be used for output of antenna based parameters. Rules for antenna name construction is same as described above for baseline names. integtime (default=Minimum integration time allowed by the data) Integration time in seconds. The data on the output stream will correspond to this much of integration in time. The final integration may not be accurate since only integer multiples of the intrinsic integration allowed by the data can be generated. normalize(default=Normalize) If set to 1 (the default), each baseline is normalized with the geometric mean of the self correlations of the two antenna participating in the baseline. If set to zero, the data would not be normalized. *********Attention!!! Use of higher integration time than the minimum allowed by the data when reading data from a pipe will currently result into unpredictable run-time behavior. fmt The format string which determines the content and the format of the output data stream. This encapsulates the fact that whatever data we need to extract from the visibility database will be a function of Baseline, Channel, Antenna number and Time. Out of these four variables, the user cannot have control on the TIME variable and this variable will always increase linearly and at the slowest rate (compared to rest of variables). The user in general will be interested in the data for a range of values for these variables. Hence, these variables in general are vectors. In the terminology used here, these are called "operators". These operators will operate on a "body". The body is a list of various parameters that the user wants, enclosed in '{' and '}' pairs. We refer to these parameters inside a body as the "elements" of the body. Each of the operators must be followed by a body and the elements of the body can be another operator-body pair. The operation that the operators perform is to loop over the body for all values of the operator. Since the elements of the body can be an operator-body pair itself, nested loops are possible. The three operators in our syntax are called "base", "chan", and "ant". The various elements that the syntax recognizes are listed below. Some of these are independent of any operator while others need to be part of the body of one or more operators. The operators that these elements need are also listed with them. Elements Operators Needed ---------- ------------------ ua,va,wa ant (antenna based) delay,phs0,dphs ant u,v,w base (baseline based) cno chan (the channel number) re,im base,chan a,p base,chan ha,ist,lst none el, az none dec,ra none rec,sno none ua,va,wa are the co-ordinates of the antenna in the (U,V,W) co-ordinate system in units of the wavelength of the center of the observing band. u is u1-u2 where 1 and 2 represent the two antenna which make the given baseline. az and el computes the antenna azimuth and elevation. cno refers to the channel number, rec and sno refere to the record number in a given scan and the scan number respectively. re,im are the real and imaginary parts of the data. a,p are the amplitude and phase of the data in polar representation. ha,ist and lst represent time in Hour Angle, Indian Standard Time and Local Sidereal Time respectively. delay,ph0 and dphs are the antenna based delay, phase applied for fringe stopping and the rate of change of this phase respectively. Thus, to get the output such that each line is tagged with the HA value and has the data in Real,Imag format for all channels of a given baseline, one can write base{ha;chan{re;im};\\n} The special element "\\n" represents the actual character that will appear at the given position in your output. The only other special element that this syntax currently allows is "\\t" (TAB). If one also wants the values of U,V,W for each baseline to appear in each line, one could write base{ha;u;v;w;chan{re;im};\\n} However note that the following syntax is an error ha;u;v;w;base{chan{re;im};\\n} This is because the elements u,v,w are a function of the baseline and they do not appear are part of the body of the base operator. If this syntax is supplied to the parser, it will generate an error message to this effect. The elements can also be qualified with a C language's printf styled format field. Hence, if one wants the HA to be written with field length of 8 characters and precision of 3 digits, the one can write the above expression as base{ha%8.3f;u;v;w;chan{re;im};\\n} The format can be of any of the 'f','g','G','e', or 'E' type. See the documentation on printf function of C language for more details. Various output formats can be generated by changing the order of loops and elements in this syntax. Here are some examples: base{ha;u;v;w;chan{re;im};u;v;\\n} ha;lst;\\n;base{u;v;w;chan{re;im};chan{a;p};\\n} base{chan{ant{delay};re;im};u;v;w;\\n} ha;base{u;v;w};\\n;base{chan{re;im}};\\n base{u;v;w};\\n;base{chan{re;im}};ha;\\n base{chan{u;v;w;re;im}} base{u;v;w;chan{re;im};chan{a;p}} base{u;v;w;chan{re;im}} base{chan{ant{delay;u;v;w};\\n}}