Status of Source Catalog Parser for GBT-to-SSS
2007-Jun-26
David M. Harland

New text is in this background: This is new text.

Background

The Scientific Software Support (SSS) team in Socorro has created an object model for an astronomical source. (Model details may be found here.) One important aspect of this model is that it may be expressed as text in an XML format. XML is the preferred format for expressing the model as text and for creating the model from text. This model will be used in EVLA software and perhaps elsewhere. One of the objects in the model is the SourceCatalog. The Green Bank Telescope (GBT) team also has a source catalog construct and a text format that can be used to create the catalog. The remainder of this document pertains to progress being made by SSS staff in creating source catalogs by parsing text files in the GBT format.

It is evident from the GBT text format that there are some small model mismatches between GBT and EVLA. We believe we will be able to modify the EVLA Source Model to accommodate some of these differences.

Parser Philosophy

The GBT Parser is designed to survive as many parsing errors as possible. The goal is to report all the errors in a file in one pass. The parser will also populate a catalog to the best of its ability, no matter how many errors were found. There is one exception to this rule and it pertains to the new syntax that we needed to introduce. The new syntax demands that the following text be the first active (i.e., non-commented) line in the file:
  catalogType=GBT
(spaces are permitted on either side of the "=" sign). We envision creating more parsers, so we need a signal at the beginning of the file that will help the software determine the proper parser. This is the only additional syntax introduced.
We introduced new optional syntax that is designed to help us pick the correct parser for a given text file format. For GBT, that syntax is:
  catalogType=GBT
If the "catalogType=xxx" line is present, it must be the first active line (ie, all lines that are not comments or blank) in the file.

GBT Formats

GBT supports four formats: SPHERICAL, CONIC, EPHEMERIS, & NNTLE. The GBT Parser aims to support all of these formats. At this point all the formats are recognized and have stubbed parsers but only the SPHERICAL format has a functional parser. The SSS Source Model supports the data brought by the SPHERICAL, CONIC, & EPHEMERIS formats; the NNTLE might be transformable into something the SSS Source Model recognizes, such as orbital elements. More study is needed here.

Keywords Common to All Formats

This is the status of the GBT Parser's support for those keywords that are of the form key=value and that are not specific to any particular format:

FORMAT

Fully supported. All four of the valid formats are recognized by the catalog parser and cause the correct source parser to be invoked. It looks like the GBT requirements restrict the format line to the top of the data file. The SSS's GBT Parser will allow this line to appear anywhere in the file and will cause the catalog parser to change its source parser. If an invalid format value is found, the parser will report the error and will use the most recently used format. If no format was specified anywhere in the file, the SPHERICAL format is used as a default.

HEAD

Fully supported. The initial stage of parsing merely holds the text after the equals sign in the HEAD = ... line. The SPHERICAL source parser will interpret the headings (see the Spherical Format section, below).

NAME

Fully supported. The value is used as the source's name.

COORDMODE

Partially supported. This keyword has a set of valid values. Here are the valid values and the effect they have on the building of an SSS Source Catalog by the GBT Parser:

  • J2000
    Fully supported.

  • B1950
    Fully supported.

  • JMEAN
    Not supported (yet?).

  • GAPPT
    Not supported (yet?).

  • GALACTIC
    Fully supported.

  • HADEC
    Not supported (yet?).

  • AZEL
    Fully supported.

  • ENCODER
    Not supported (yet?).

Any values other than those above are reported and ignored. This means that the parser's coordinate mode is left in its current state. If no COORDMODE was ever specified, J2000 is used as the default.

EQUINOX

Not supported (yet?).

VELDEF

Fully supported. The value of VELDEF comes into play only if velocity values are later specified in the source data. The value found here causes the valuation of both the velocityFrame and velocityConvention properties of SourceVelocity. Any values other than the legal values are reported and ignored. This means that the parser's velocity frame and convention properties are left in their current states. If no VELDEF was ever specified, OPTICAL is used as the default convention and BARYCENTRIC is used as the default frame. (To Do: ask GBT for appropriate defaults.)

SPHERICAL Format

When this format is detected the Source uses a PolynomialPosition object to hold its position information.

Column Keyword Support

NAME

Fully supported.

VEL & VELOCITY

Fully supported. The units are always taken to be km/s.

RESTFREQ

Not supported. Our SourceVelocity object does a have frequency range over which it is valid, but the setting of frequencies will be part of the SSS Resource Model.

RA & DEC

Fully supported.

AZ & EL

Partially supported. As mentioned above SSS may alter its Source Model to accomodate AZ/EL specification. For now these values are parsed as if they were RA & DEC.

GLON & GLAT

Partially supported. As mentioned above SSS may alter its Source Model to accomodate GLON/GLAT specification. For now these values are parsed as if they were RA & DEC.

User-Defined Keywords

Any column not recognized as one defined by GBT's specifications will be treated as a user-defined column. The parser will make note of these columns so that clients may inspect them if they wish. The user-defined values will be saved for each source.

Column Keyword Validation

Some of the column headings imply information about the values of other column headings and/or about the values of other keywords. This is a summary of the cross-validation done for some of the keywords. (Any keyword not listed below has no cross-validation with other keywords.)

RA
  • The CelestialCoordinateSystem must be EQUATORIAL. (This means the parser detected a COORDMODE of either J2000 or B1950.)
  • A DEC column must be present. (Note that if it is not, all declinations will be assumed to be zero degrees.)
  • The following columns must not be present: AZ, EL, GLON, GLAT.

DEC
  • The CelestialCoordinateSystem must be EQUATORIAL. (This means the parser detected a COORDMODE of either J2000 or B1950.)
  • An RA column must be present. (Note that if it is not, all right ascensions will be assumed to be zero hours.)
  • The following columns must not be present: AZ, EL, GLON, GLAT.

AZ
  • The CelestialCoordinateSystem must be HORIZONTAL. (This means the parser detected a COORDMODE of AZEL.)
  • An EL column must be present. (Note that if it is not, all elevations will be assumed to be zero degrees.)
  • The following columns must not be present: RA, DEC, GLON, GLAT.

EL
  • The CelestialCoordinateSystem must be HORIZONTAL. (This means the parser detected a COORDMODE of AZEL.)
  • An AZ column must be present. (Note that if it is not, all azimuths will be assumed to be zero degrees.)
  • The following columns must not be present: RA, DEC, GLON, GLAT.

GLON
  • The CelestialCoordinateSystem must be GALACTIC. (This means the parser detected a COORDMODE of GALACTIC.)
  • A GLAT column must be present. (Note that if it is not, all latitudes will be assumed to be zero degrees.)
  • The following columns must not be present: RA, DEC, AZ, EL.

GLAT
  • The CelestialCoordinateSystem must be GALACTIC. (This means the parser detected a COORDMODE of GALACTIC.)
  • A GLON column must be present. (Note that if it is not, all longitudes will be assumed to be zero degrees.)
  • The following columns must not be present: RA, DEC, AZ, EL.

The parser will report violations of any of the above rules but will do its best to continue to read the data and create a catalog.