This appendix describes the design of the program xtract and the macro language used to extract visibility and auxiliary data from GMRT LTA database. Section A.1 is intended for ”plain” users of the program. The xtract macro language parser is also available as a stand-alone library, which can be used in other applications. The internal design of the library is described in section B.3 and is targeted for more enterprising users, who will find if useful to extend this program. Section B.4 describes the application programmers interface (API) of the library. Section B.5 describes the mechanism to extend the list of data/parameters which can be extracted.
The macro language encapsulates the fact that whatever numbers we need to extract from the visibility database are either antenna based, interferometer based (or equivalently, baseline based) and/or a function of time and/or frequency within the given observing band. One may want to extract data for a list of antennas, baselines, frequency channels with selection applied in time.
The macro language syntax is a hybrid of the implicit loops of the write statement of FORTRAN (see the manual for FORTRAN) and the format string used by the output functions of C (see documentation on printf in the manual for C). Three operators are defined in the language, namely base, chan and ant, which loop over a list of baselines, frequency channels and antennas respectively. The implicit loops loop over the body of the operators. The body is a list of semicolon (’;’) separated list of elements enclosed in pair of curly braces (’{’ and ’}’). Each operator must be followed by a body and the elements of the body can be another operator-body pair. Hence, nested loops are possible.
In the xtract program, the list of values for the operators is supplied as a list of comma (’,’) separated values for the keywords baselines, channels, and antenna respectively. An operator for time range selection is also required, but not explicitly defined. All macros are internally the body of this time range operator. The time range selection can be specified via the timestamps keyword.
Various elements which the syntax recognizes are listed in Table B.1. Some of these are independent of any operator, while others need to be part of the body of one or more operators. The elements and the operators required by these elements are also listed in the table.
|
ua,va,wa are the co-ordinates of the antenna in the (u,v,w) co-ordinate system in units of the wavelength of the center of the observing band and u=ua1-ua2, where the subscripts refer to the two antennas of a baseline.
The macro to produce a table of rows with Hour Angle (HA) value in the first column followed by two columns for the real and imaginary parts of the visibility at a single frequency for all selected baselines, would be
The special element \\n represents the actual character that will appear at the given position in the output (which is the NEWLINE character here). The only other special element that this syntax currently allows is \\t (TAB).
The (u,v,w) values for each baseline can be added to each row of the table by the following macro
However note that the following macro is in error
This is because the elements u,v,w are a function of the baseline and they do not appear as part of the body of the base operator. The macro parser will generate an error message pointing out the possible error in this macro.
The elements can also be qualified by a C-styled printf format field. Hence, for example, if the value HA needs to be written with field length of 8 characters and precision of 3 digits, the fmt string would become
The format for the numbers can be of type ’f’,’g’,’G’,’e’, or ’E’ (see the documentation on printf function of C language for more details).
Various output formats can be generated by changing the order of loops and elements in this syntax. Here are some examples. Each of these will generate a table. The values in the various columns will be as given in the explanation.
Column 1 will be the Hour Angle. Columns 2,3, and 4 will have the u,v,w values followed by 2 × N columns for real and imaginary values for the N values that the chan operator can take. There will be one such row in the table for each value of the base operator.
This format will generate a table with rows of unequal lengths.
Row 1 will have only HA and LST values.
Row 2 will have u,v,w in the first 3 columns followed by real, imaginary, amplitude and phase for all channels listed in the chan operator. There will be one such row for every value of the base operator.
This macro will generate a table of set of two rows of unequal lengths per input data record.
First row will have the HA and u,v,w values for each selected baseline.
Second row will have the real and imaginary values of the visibilities for all channels of the chan operator and for all values of the base operator.
The macro language is used by the application program xtract to extract data from the GMRT visibility database. Most common use of xtract is to extract a data in the form of an ASCII table for display and/or further processing (e.g., to compute the antenna pointing errors). The output of xtract can be supplied to another program in two ways.
By default, xtract writes the output on the standard output. Hence if xtract is started as
the output of xtract will be piped to the standard input of the program named MyProg. The other, probably more convenient, method of piping data is to set the out keyword to ’|MyProg’.
The output will be written in ASCII format, preceded by a simple header. Apart from other fields, the header contains information about the number of rows and columns and the labels for each of the columns. This header always ends with a string “#End”, after which the data is written. A line beginning with ’#’ is also written per LTA-scan. It is hoped that users will utilize these facilities to generate more filters to process and display data externally.
If the output file name begins with a ’*’, the file name is constructed after stripping the initial ’*’ character and the data is written in binary format (floating point numbers of size determined by the operator sizeof(float) of C or C++). The data itself is preceded by the ASCII header mentioned above. Hence, out=*tst.bin will produce a file tst.bin, which will contain the output in binary format and out=*|MyProg will pipe the binary data to MyProg.
For convenience of usage, a filter has been incorporated on the output stream of xtract which will supply the data directly to the QDP line plotting package. This filter can be invoked by setting out=>QDP. The output, in this case will be displayed as a stack of line plots using QDP.
A more general and usable graphical interface to the multiplot features of the freely available line plotting program Gnuplot has been developed by (Kudale & Bhatnagar NCRA Tech. Rep. - in prepration). The data to this software can be supplied using the piping mechanism described above. A graphical user interface then allows the user to select the available baselines/antennas and plot them interactively in a flexible manner.
The xtract macros are first interpreted and then compiled in the memory. This complied code is then executed for every input data record. The details about the compilation and execution of the format string are given below.
The process of compilation of the format string involves two steps.
First, all the loops represented by the operators in the macro are exploded into a linked list (also called the symbol table), with each node of the list corresponding to a valid element of the language. Each element is represented in the memory by a structure of the following type:
All recognized elements (symbols) are tabulated in the memory in a temporary table, which is a list of structure of the following type:
This table is hard-coded in the file table.h and is used only to validate the symbols in the macro. Once validated, the Class and Type information for this table is transfered to the actual symbol table and the temporary table destroyed.
Apart from the name of the element and the C styled format string, the nodes of the symbol table also have information about the mechanism to get the numeric value associated with the element. This information is in the field Type of the structure above. Valid types for the elements are listed in Table B.2.
|
The abc field of the element structure shown earlier, holds the values of the three operators (ant,base, and chan) applicable to the element.
Before an element is added to the symbol table, a check is made to ensure that all the required operators (listed in Table B.1) are active. To generate this information about the required operators, elements are further categorized into one of the classes listed in Table B.3.
|
Once the element is validated for the required active operators, a new link is created in the symbol table and filled with the Name, Type and Class of the element. By this time, the loops (represented by the list of values associated with various operators) have already been exploded (i.e., a node created in the symbol table for each value of the operator). Information about the values of the operators is transfered to the symbol table for every value of the active operators and the values of the required operators are put in the abc array (passive operators are assigned a value of -1). By this time, if no error has occurred, it is assured that the syntax was correct and all the elements in the macro were recognized.
Second step in the process of compilation is to fill in the information about the mechanism to get the numeric values of each elements in the list. The Type of the element and, if required, the values in the abc array are used for filling in this information.
For elements of type PTYPE , the ptr field is made to point to the location in the memory where the required value is to be found. This type of element refer to particular values in the buffer in the memory and need the offsets in the buffer which can be computed using the abc array. The buffer in the memory is generally the buffer in which data records from the LTA-file is read. Examples of this kind of elements are IST, real/imaginary values of the visibility, etc.
For elements of type FTYPE, the func field is filled with a pointer to a function which will be called when the value of the element is required. If the computation of the value requires some data, the pointers to this data is put in the field fargv and the total number of such pointer is put in the field fargc. These will be passed as arguments to the function when the value of the element is required. The first argument passed to the function will be the name of the element. Examples of this kind of elements are HA, amplitude/phase of the visibility, etc.
For elements of type CHARType, nothing needs to be done. The name of such elements is the character that is to be copied to the output during execution.
The process of “execution” of the compiled list of elements is rather simple. The program steps through the entire list of elements and checks the type of each element on the list. If the type is PTYPE, the value of memory location to which ptr points, is copied to the output stream using the format in the Fmt field of the element. If the type is FTYPE, the function specified by func is called with Name, fargv, and fargc as the arguments. The value returned by this function is then copied to the output stream using the format in the Fmt field of the element. If the type is CHARType, the first character of the Name field is copied to the output stream.
Following is an example of a simple routine used for execution of the compiled macros:
If the output in required is the binary format, one can write an equivalent Execute routine, which will ignore the Fmt field and CHARType elements and output the values in the binary format.
The process of compilation and execution of the xtract macro described above is done via a stand-alone library. This section describes the Application Programming Interface (API) of this library.
The C/C++ interface of this library is defined in fmt.h, which must be included in the code and linked to libjump.a, in addition to all other GMRT Off-line libraries (liboff.a, libregex.a, libkum.a).
The xtract macros are interpreted via the following function call:
The first argument is the macro as a NULL terminated string. Second argument is the fftmac (Bhatnagar 1997a)1 structure which holds the various mappings for the LTA database (e.g., sampler to the MAC mapping, etc.). This structure must be filled using services provided by the GMRT Offline Library2 (getFFTMac method). The third argument is a pointer to the structure of the type Parameters. This structure holds the various parameters which the library uses while executing the macro. Various fields of this structure are described in Section B.4.3. The value of some of the fields of this structure are defined by the user, while others are to be extracted from the LTA database. The fourth argument is a pointer to a structure of type SymbType. This is the table of elements mentioned earlier and must be initialized to NULL before being passed to this routine. A return value of less than EOF (-1), indicates a syntax error in the macro.
Compilation of the macro string is done via a call to:
The first argument is the symbol table returned by a call to interpret. It now points to the head of a linked list of nodes of type SymbType. The last node of this list is NULL. The second argument is the fftmac structure. The third argument is the table of antenna co-ordinates. This can be retrieved from the LTA database via the services provided by the GMRT Offline Library3 (getFFTMac method). The fourth argument is a pointer to the Parameters structure. A return value of less than EOF(-1) indicates error in compilation of the macro. On successful compilation, it returns the size of the compiled symbol table in units of the size of the structure SymbType.
If the interpretation and compilation of the macro was successful, the compiled macro can be executed via calls to a user-supplied function of signature
int Exec(FILE *fd, SymbType *Inst, float *Buf, int ProgSize) fd refers to the output file already opened for writing. Inst is the symbol table returned by interpret. In case the output data is not to be written to any file, the user can write versions of this routine which will fill the data in the buffer Buf. ProgSize is the value returned by Compile.
The data field of the Parameters structure (see section B.4.3) must be made to point to the buffer in which the LTA-data buffers are read. To generate a regular stream of output, corresponding to each input data record, this function must be called every time a new LTA-data record is read.
Few types of Exec functions are provided in the library. These include:
It writes the output data to the fd file descriptor. It does not use the Buf pointer.
This writes the output data to the buffer pointed by Buf. The size of the this buffer must be big enough to hold one floating point number per node of the symbol table (return value of Compile). This does not use the file descriptor.
This supplies output data to the QDP program via a pipe opened via the libpipe.a library. This uses the Buf pointer but does not use fd.
To generate any other functionality, the programmers need to write their versions of this function. The recommended route for writing a new function is to modify Execute or ExecuteDef functions.
The Parameters structure is of the following type:
Various fields and their use is as follows:
This must be set to 1 if the visibility data is to be normalized by the geometric mean of the self correlations. Otherwise this must be set to 0.
These are pointers to the user selected list of the baseline, scan, antenna and channel numbers respectively. The list of channel numbers must be 0-relative and not the absolute channel index of the data base (which could start with number between 0 to maximum number of channels).
Typically, the user selects the baselines and antennas via the baseline/antenna names. These are supplied as strings by the user. Two functions, MkBaselines and MkAntNo, are provided to convert these stings to a list of bit fields in which the bits corresponding to the selected baselines are set to 1. Another routine toIntList, is provided to convert the bit fields to list of integers representing the selected baselines. These functions are available in the library liboff.a and are described in Appendix A of GMRT Offline Library4 .
These are the lengths of BList, SList, AList and CList.
These are the number of the baselines, channels and antenna in the data base.
This is the number of the first frequency channel in the data base.
These are the offsets within the LTA-data buffer to locate the time stamp, the auxiliary parameters, and the visibility data itself. These offsets can be extracted from the global header of the database.
These are the values of sin(δ) and cos(δ) where δ is the declination of the pointing center of the telescope. These are used for the calculation of the (u,v,w) co-ordinates during execution.
This is the multiplication factor used to convert the time stamp in the data to seconds of time. This is also extracted from the global header.
This the wavelength of the observing frequency in meters.
This is the pointer to the beginning of the LTA-data buffer.
To add a new elements to the xtract macro language, one needs to define the values of Name, Class, and Type of the new symbol in the table of valid elements. This is done by adding to the table in the file table.h (make sure the last element of this list is left unaltered).
One also needs to add a piece of C-code, which will fill the required fields of the structure SymbType (depending upon the Type of the element – the ptr field for PTYPE elements and the func, fargv, and fargc fields for FTYPE elements). It is the responsibility of the programmer to make sure that this code is correct in terms of getting the numeric value of the elements. Also, the programmer must make sure that this code is compatible with the Type of the element. Failing to do so will either generate wrong values or crash the program at the time of execution. This code is to be added in the function Compile in the file Compile.c. The application will need to be rebuilt for the new symbol to be recognized in the fmt syntax.