Next: The GMRT Data Up: Data analysis software Previous: The user interface for Contents

Subsections

Data analysis and display programs

This section describes a few of the application program which were extensively used while observing with the GMRT as well as for instrumental calibration and the measurement of various telescope parameters.

The program `xtract` and its variants

The visibility function, denoted by , depends on a number of parameters such as the local sidereal time (LST), observing frequency, the antenna co-ordinates, the co-ordinates of the phase center, the compensating delays applied to the various antenna, antenna fixed delays, antenna positions, etc. (i.e. $V(b_{ij}(t),\nu,\tau,$ $\partial \tau /\partial t,...)$ ). During debugging, it was frequently required to view this data in various representations (e.g., Cartesian versus polar representation of complex numbers). Also, depends on a multitude of parameters, and different debugging purposes require viewing with respect to different quantities. It was therefore not useful to implement a program to extract the data as a function of a fixed set of parameters. Since, as a design philosophy, stand-alone data display software was used, the need for the extracted data to be in a variety of formats (binary as well as in plain ASCII) directly readable by the display/plotting programs was also frequently felt. It was therefore necessary to develop a compact macro language parser to extract and display the data in a flexible and programmable manner.

xtract (Bhatnagar1997b) was designed to extract the visibility data and its parameters in a programmable fashion via a compact macro language. It was also designed to be as general as possible and easily interface with the many stand-alone data display programs in use. In fact, it was designed with the wider goal of making it easier for a large spectrum of astronomers/engineers using the GMRT to be able to access the visibility data and do further processing if required. This program was extensively used for this dissertation and is now being regularly used for a variety of applications ranging from on-line and off-line data browsing and display, beam shape, pointing offset, antenna sensitivity measurements, band shape monitoring, etc.

The xtract macro language (described in Appendix B) provides a mechanism to define the contents and the output data format. The macros are constructed using the three operators of the language, namely ant, base and chan and the various elements (Table B.1). The three operators loop over all the selected antennas, baselines and frequency channels respectively. The language syntax allows arbitrary grouping of the operators with a semi-colon separated list of elements as the body of these operators. This effectively provides a compact way of defining macros to extract visibilities and/or other parameters as a function of antenna, baseline and/or frequency channel number. These macros are, in fact, nested loops over the lists of antennas, baselines and/or frequency channels. The entire macro is implicitly the body of the loop over time and is executed for every input data record. The list of available elements are listed in Table B.1.

For example, the xtract macro to produce a table with the first column containing the time stamp (Indian Standard Time (IST)) followed by two columns for the amplitude and phase of the visibility from each selected baseline, will be ``ist;base{chan{a;p}}; n''. The list of baselines and channels for which data is to be extracted can be specified via the user interface keywords baselines and channels. Similarly the co-ordinates of antennas can be extracted in a table as a function of by the macro ``ha;ant{ua;va;wa}; n''. The list of antennas can be specified via the antennas keyword.

The xtract macro language complier is also available as a stand-alone library. The API of this library can be used by other application programs to parse, execute and get the result of the macros. An attempt was also made to write one such application for the graphical display of the LTA database (line and gray scale plots). This application unfortunately did not stabilize and is not in use.

Antennas are identified by the correlator sampler to which they are connected. In the final double sideband GMRT correlator, there will be 4 samplers per antenna, i.e. 2 samplers for the two polarization channels per sideband. The four signals from each antenna (2 polarizations from each sideband) can therefore be treated as four logical antennas. All samplers in the correlator are also uniquely numbered. The logical antennas can therefore be specified by the sampler number to which they are connected, or by an antenna name consisting of three hyphen separated fields - the antenna, the side-band and the polarization names. A name for a logical antenna is said to be ``fully qualified'' when all the three fields in the name are specified. All these fields can be regular expressions. Similarly, baselines can be specified by a baseline number or by a baseline name composed of colon (':') separated logical antenna names. Here again, the entire antenna names (before and after the ':') can be regular expressions. The use of regular expressions for the components of logical antenna names as well as for specifying the two antennas of a baseline provides a very general, compact and powerful selection mechanism. The antenna and baseline naming conventions are described in detail in Appendix C. This convention is uniformly followed in all off-line programs where data selection based on antenna and/or baselines is required.

xtract normally writes the output in ASCII format with a header specifying the names of different columns and other information about the extracted data. Formats required by a number of freely available line and gray scale plotting programs can be generated by writing the required xtract macro. The extracted data can be directly read by these programs for display. The data can also be extracted with the header in ASCII followed by the data in binary format by specifying the output file name beginning with the character '*' (it is conceivable to write a xtract macro to produce a FITS file to be displayed using any of the FITS image display programs, e.g. for the dynamic spectra from selected baselines).

The user interface provides mechanisms to externally set keywords to some fixed default value(s) and to suppress these keywords from the user interface. This is used to effectively generate specialized variants of xtract. One such variant named oddix was extensively used for on-line display of the amplitudes and phases from various baselines. The data was read from the shared memory using a modified version of the record program, and an xtract macro defined to produce the output as a binary table with each row corresponding to a single integration time. The output was then piped to a program which further supplied the data to a display program over the network via a UNIX socket. The display program, also written as part of the off-line package, displays an arbitrary number of stacked scrolling line plots. The display surface itself is scroll-able, allowing the viewing of a very large number of line plots at a time.

The `rantsol` and `badbase` programs

The observed normalized visibility can be written as

$\displaystyle \rho_{ij}^{Obs} = g_i g^\star_j \rho^\circ_{ij} + \epsilon_{ij}$

(6.1)

where

is the antenna based complex gain, $\rho^\circ_{ij}$ is the ideal point source visibility and $\epsilon_{ij}$ is the baseline based noise. For an unresolved source of unit flux density, $\rho^\circ_{ij}=1$ . For an

antenna array, the

s represent

unknowns - the amplitudes and phases of all the

s (the phase of one of the antennas can be treated as the reference and hence set to zero). $\rho_{ij}^{Obs}$ represents

observed complex quantities. For

this is an over-determined problem and hence solvable. The program rantsol implements an algorithm to solve for the antenna-based complex gains for an unresolved source. The algorithm itself is described in Appendix D.

The visibility data includes data from non-working or malfunctioning antennas as well as data affected by closure errors due to any malfunctioning of the correlator. Since the antenna based complex gains are obtained using a global fit involving all the data, the presence of bad data can result into problems ranging from noisy solutions even for the good antennas to no convergence at all. This can happen even in the presence of a few bad antennas and/or presence of as few as % bad baselines. It is therefore important to remove bad data before attempting to solve for the antenna gains.

This identification and flagging of bad data is done automatically in two passes for every solution interval. Antenna based complex gains are first solved for using all the data. Solutions from the first pass are then examined and antennas with amplitude gain less than the user defined threshold are flagged for the second pass. Antennas which are found to be bad in this manner are assigned a complex gain of 1. The solution for an antenna can also be interpreted as a weighted average of the complex visibilities from all baselines with the given antenna (see Section D.1). This averaging in each successive iteration can be done robustly by on-the-fly flagging of data points, which deviate by more than a threshold defined in units of the variance of the series of complex numbers being averaged. Data from a baseline with large closure errors will have large deviations from the mean defined by the data from 'good' baselines. Such data will be identified and flagged in robust averaging. Both these techniques are used in the algorithm implemented in the program rantsol to make it robust, even in the presence of time variant closure errors (the latter have been noticed on several occasions).

In practice, rantsol has been found to be very robust in the presence of non-working or malfunctioning antennas and malfunctioning MACs in the correlator (which produce large closure errors). It can be used almost as a black box for most of the calibrator databases without the need to identify and flag bad data. rantsol was regularly used to compute the antenna based complex gains and another program, badbase was used to identify bad data from calibrators scans. badbase examines the amplitude and phase of the calibrated visibilities (defined as $\rho_{ij}/g_i g_j^{\star}$ ) and reports the fraction of time a given baseline was found to be bad (badness defined as $\vert\rho_{ij}\vert$ and $arg(\rho_{ij})$ greater than a user defined threshold). Antennas and baselines which are continuously bad for large fractions of the total observing time were easily identified and flagged before mapping. This was found to be extremely important as the calibration tasks of the AIPS package (used for calibration and mapping) are very sensitive to bad data and sometimes resulted in no convergence at all due to the presence of about 10% bad baselines!

The GMRT typically produces a few hundred baselines per snapshot. Monitoring the data quality for phase, amplitude and closure errors corresponds to monitoring data streams from each of these baselines. This is obviously not practical. However there are only antenna based complex gains corresponding to antennas. Solution for antenna-based complex gains using rantsol effectively enforces closure constraints. On-line monitoring of these antenna based complex gains, derived using rantsol therefore gives a good summary of the data quality. rantsol output was also therefore routinely used to on-line inspect the quality of data. The gains were supplied to the online display software mentioned above which displayed the solutions as a set of scrolling line plots (the antenna based amplitudes and phases for the calibrator scans). This was of immense use in identifying time variable problems while observing and ensured that the recorded data was of reasonably good quality.

rantsol is now regularly used as a black box for a variety of purposes ranging from baseline and fixed delay calibration (Figs. 2.11 and 2.12 are examples of typical rantsol output), pointing offset, beam shape, sensitivity, system temperature measurements to the GMRT phase array operations for pulsar observations (Sirothia2000).

The output of rantsol can be formatted to be directly read by the QDP^6.1 plotting package using the awk scripts getamp, getphs and getres. These scripts extract the amplitude and phase of the calibrated visibilities ( $\rho_{ij}/g_i g_j^\star$ ) respectively.

The `closure` program

The antenna based complex gain can be written as $g_i= a_ie^{-\iota \phi_i}$ where and $\phi_i$ are the antenna based amplitude and phase. For the ideal case with no baseline based errors ( $\epsilon_{ij}=0$ for all and ), the phase of the triple product for an unresolved source, $\rho_{ijk}=arg(\rho_{ij} \cdot \rho_{jk} \cdot \rho_{ki})=0$ and the amplitude of $\rho_{ijkl}=\left\vert \rho_{ij}\right.\cdot \rho_{kl}/\rho_{kj} \left. \cdot \rho_{il} \right\vert=1$ . The phase of $\rho_{ijk}$ is referred to as the closure phase and the amplitude of $\rho_{ijkl}$ is referred to as the closure amplitude. These closure quantities are a good measure of the baseline based errors in the system. Ideally, the signals from various antennas, flowing through independent paths, are mixed only in the MAC stage of the correlator. Although there are several sources which can produce small closure errors, catastrophic closure errors, which severely limit the final dynamic range in the maps, can be traced to malfunctioning MACs in the correlator. The closure phases are therefore a very important quantity to monitor during the observations. It is also important to examine these closure quantities while processing so as to identify bad data; this is done using the closure program.

The output of this program, which computes all the closure quantities from the data, was also used for on-line display of the closure phases. This output was simultaneously also supplied to another program which raised an alarm in case the closure phases for the calibrator scans deviated from the expected value by more than some threshold amount. This helped in identifying problems with data before spending long hours of observing and recording of otherwise unusable data.

Conversion to `FITS` format

The final inversion and mapping of the visibility data was done using the AIPS^6.2 package. The visibility data was imported into AIPS by first converting the data from the LTA format to FITS format using the off-line program gl2fit^6.3 (however, see Section 4.4.1 for some details about use of other data filters before gl2fit). The FITS file was then imported to AIPS using the AIPS task FITLD.

Miscellaneous programs

In addition to the programs mentioned above, various other programs, which were routinely used during the course of this work, were also developed. Although these programs are not used for numerical computation, they are nevertheless useful and often indispensable.

The program ltainfo lists the astronomically useful summary of each scan in the LTA database (total number of scans in the database, string identifier for the astronomical object, RA and Dec of the pointing center (in date epoch), the hour angle corresponding to the first data record, centre frequency of the RF band and the width of each frequency channel).
The program ltacat concatenates multiple LTA databases into a single database. Since the LTA-global header applies to the entire database, all constituent databases are checked for consistency with the first LTA database. Those LTA databases which do not conform to the global header of the first database are rejected.
Quite often, due to various interruptions during long observations, the LTA database was split into several files. These individual files were concatenated into one using ltacat (this was necessary for data browsing and detection of bad data as discussed in Chapter 4).
The program ltacleanup checks a LTA database for conformity with the LTA format and attempts to fix some of the inconsistencies. It can also be used as a filter to select scans and/or perform integration in time and/or frequency.
Due to problems related to synchronization between the online array control and correlator control softwares, bad scans may be recorded in the LTA database (e.g. scans with zero data records, scans with wrong values for the frequency/bandwidth settings, scans with wrong pointing centre, scans with un-usable data due to a variety of reasons, etc.). Such scans need to be removed to produce a database with only usable scans. ltacleanup was used to remove such scans as well as to correct some inconsistencies with respect to the LTA format.
The program fixit was used to edit the values of keywords in the global header and/or the scan headers.
The program tmac checks for time stamp related errors in the LTA database. Due to synchronization problems between the various network programs comprising the Data Acquisition System (DAS), situation sometimes arise where the time stamp of successive data records does not monotonically increase. tmac rejects all data records till it finds a valid time stamp. It was often used as a filter before gl2fit to eliminate any offending data records.
The program offstop was written for off-line fringe stopping when on-line fringe stopping in the correlator was not implemented. It later evolved into a program to apply residual compensating delay due to frequency offsets or antenna position offsets. The frequency and/or antenna position offsets for which fringe-stopping should be carried out are the input parameters. This program was extensively used to test the derived antenna co-ordinate corrections and fixed delay corrections during baseline and delay calibration efforts.
The program calibbp does primary phase and bandpass calibration. This was written to allow freedom to test various calibration schemes for the GMRT data. The antenna based complex gains are determined using the phase calibrator scans and antenna based band-passes are determined using the bandpass calibrator. Average gains can be found by integrating the calibrator visibility data. As an alternative, antenna based gains can be averaged to determine the average gains. The latter has the advantage that the automatic bad data detection algorithm is effective while determining the antenna gains.
The program lin subtracts linear slopes across the band for each baseline. This was required for tracing the source of spectral features across the band at P-band (and possibly at other bands as well). The data was time and bandpass calibrated using calibbp and linear slopes removed from each baseline. This is the equivalent of the UVLIN (Cornwell et al.1992) operation done for spectral line data analysis. The spectral features seen across the band after passing it though the program lin, probably limits the detection of weak spectral lines, at least at P-band, and this is still an unsolved problem.

Next: The GMRT Data Up: Data analysis software Previous: The user interface for Contents

Sanjay Bhatnagar 2005-07-07

Data analysis and display programs

The program xtract and its variants

The rantsol and badbase programs

The closure program

Conversion to FITS format

Miscellaneous programs

The program `xtract` and its variants

The `rantsol` and `badbase` programs

The `closure` program

Conversion to `FITS` format