Use Case: ProcessScienceData

The Science Data Pipeline reduces data in an automatic way, taking input data from the Raw Data Archive, and putting results into the Science Archive. The science data Pipeline reduction should not be a bottleneck for the array operation.

Science data processing will take place after the end of a Project to perform `final' imaging and deconvolution. Image deconvolution will be performed only on the occurrence of a Break Point (as defined at Proposal preparation in the Observing Tool), or if the program has reached the point where no further visibility data needs to be obtained to complete the image.

The Science data processing will use the calibration data after they are computed by the Calibration Pipeline.

Goal:   Provide final reduction of the data taken for a project, including deconvolution of data cubes where appropriate.

Contact Authors:   C. Wilson, L. Davis, R. Lucas

Role(s)/Actor(s):
Primary: Pipeline Subsystem, Science; Array Observing System.
Secondaries:

Priority:   Major

Performance:   data processing shall be completed no longer than 12 hours after the end of data acquisition for a project.

Frequency of Use:   Minutes.

Preconditions:

  1. Data are available from the Correlator and Total Power detectors, in the Archive for the on-line pipeline, or in temporary storage for offline pipeline operations.
  2. Calibration Data are available from the Calibration Pipeline via the Archive.
  3. Reduction scripts are available, as defined in the Observing Programme.
  4. The Scheduler informs the Observer (by email) that Science processing has started.

Basic Course:

  1. The Scheduler tells the Science Pipeline (via Pipeline Executor) to begin processing data for a new project when the Project (or session) ends or a breakpoint is reached.
    Alternate course: off-line execution.
  2. Pipeline Control provides facilites for the operator / ALMA staff member to intervene in the pipeline processing queue, start and stop jobs, etc. It also provides a GUI display of the current status of the pipeline executor.
  3. Pipeline Executor retrieves raw, calibration, and metadata for the current observing session from the archive for processing.
    Alternate course: Pipeline Executor retrieves data between user-defined break points in the current session.
  4. Pipeline Executor checks for and, if found, retrieves calibrated uv data from previous sessions from the archive.
  5. Pipeline Executor asks Pipeline Heuristics Lookup to select an appropriate Pipeline Application.
  6. Pipeline Heuristics Lookup uses information in the raw, calibration, and metadata to select the appropriate standard mode for data reduction (the Pipeline Application) and to set some parameters at decision points in the Pipeline Application.
  7. Pipeline Heuristics returns appropriate Pipeline Application to Pipeline Executor.
  8. Pipeline Executor manages the processing of the selected Pipeline Application to calibrate and image the data.
    Alternate course: a second data product is produced using options specified by the Observer.
    Exception course: the Operator is notified if processing fails.
  9. Pipeline Executor stores calibrated data, data cubes, quality indicators, reduction scripts and logs to the Archive.
  10. Pipeline Executor reports to Scheduler that science processing for this project is complete.

Alternate Course:

  1. The Observer initiates the Pipeline offline.
  2. The Pipeline executes script commands sequentially, using previously obtained calibration data.

Postconditions:

  1. Science data (images) are written into the Science Archive. May be an Observer specified product as well as a standard product.
  2. Data reduction scripts and logs are written into the Science Archive.
  3. Optionally: the Pipeline makes quality check results available to the Scheduler.
  4. The Scheduler informs the Observer by email when science data processing is complete.

Issues to be Determined or Resolved:  

  1. How often is this run per project? Just at the end of the project and at breakpoints? Probably also at the end of each observing session in a project. If we run it more than once per project, how long do we store intermediate results? If the intermediate results are truly redundant, and they can be reproduced by the processing script (this can be tricky if all the software is not archived), and it is possible to tell how the intermediate results helped to produce the final results, they could be deleted. It might be simpler to save everything produced by a valid pipeline processing request ...
  2. Does THIS Use Case need to be run at home institutes too, or just the heuristics/application bit? If so, needs more work on the off-line section to make it clearer where inputs are coming from.
    LD. This is a bit of a TBD. I would like the pipelining software to be exportable. Although this is not stricly required by ALMA it may fall out of the design and implementation.
  3. Does the Pipeline Heuristics Lookup need to use information in the RAW data, or just the calibration and metadata, to select the appropriate standard mode for data reduction? This requirement was in Memo 11; we need to decide if it is really necessary. Or can we determine the observing mode from the high level metadata without extracting all the raw data? Ditto for retrieving observing parameters and checking for validity.
  4. Should the Pipeline makes quality check results available to the Scheduler, and if so, how? Currently quality checks are sent to the archive only; is feedback to the scheduler useful given the time lag? This requirement was in Memo 11. An example of where quality checks might need to be fed back would be if a whole session turned out to be bad for some undiagnosed reason. Then the Scheduler would need to know that the scheduling blocks need to be observed again.

Notes:  

  1. This Use Case was modified by C. Wilson to help define Science Pipeline requirements. Relevant SSR Use Case from SSR Memo 11 is 4.5.3 (Process Science Data) by R. Lucas.
  2. The Science Data Pipeline shall be run either at or near the telescope, or at the places where the Archives are kept.
  3. If the Pipeline of a previous Session is still active, the Pipeline of the current Observing Session has priority for allocation of computing resources.
  4. An Observer must be able to look at the Pipeline results of recently observed Programmes without downgrading the Pipeline performance on the currently observed Programme.

    Last modified: 12aug03