Use
Case: ProcessScienceData
The Science Data Pipeline reduces data in an automatic way, taking input
data from the Raw Data Archive, and
putting results into the Science Archive. The science data Pipeline reduction
should not be a bottleneck for the array operation.
Science data processing will take place after the end of a Project
to perform `final' imaging and
deconvolution. Image deconvolution will be performed only on the occurrence
of a Break Point (as defined at
Proposal preparation in the Observing Tool), or if the program has reached
the point where no further visibility
data needs to be obtained to complete the image.
The Science data processing will use the calibration data
after they are computed by the Calibration Pipeline.
Goal: Provide final reduction of the data taken for
a project, including deconvolution of data cubes where appropriate.
Contact Authors: C. Wilson, L. Davis, R. Lucas
Role(s)/Actor(s):
Primary: Pipeline Subsystem, Science; Array
Observing System.
Secondaries:
- Scheduling Subsystem - Activates Science Pipeline
- Pipeline Subsystem - retrieves data from archive;
calibrates uv data; combines
with previous data where appropriate; performs imaging and deconvolution;
writes final results to Archive
- Operator, Staff Astronomer - may intervene to start,
stop, or suspend on-line pipeline operation.
- Operator, Staff Astronomer, Observer - may intervene to start,
stop, or suspend off-line pipeline processing.
- Archive - provides raw data; receives calibrated data, including
data cubes.
Priority:
Major
Performance:
data processing shall be completed no longer than 12 hours after the
end of data acquisition for a project.
Frequency of Use:
Minutes.
Preconditions:
- Data are available from the Correlator and Total Power detectors, in
the Archive for the on-line pipeline, or in temporary storage
for offline pipeline operations.
- Calibration Data are available from the Calibration Pipeline
via the Archive.
- Reduction scripts are available, as defined in the Observing Programme.
- The Scheduler informs the Observer (by email) that Science processing
has started.
Basic
Course:
- The Scheduler tells the Science Pipeline (via Pipeline Executor) to begin
processing data for a new project
when the Project (or session) ends or a breakpoint
is reached.
- Alternate course: off-line execution.
- Pipeline Control provides facilites for the operator / ALMA staff
member to intervene in the pipeline processing queue, start and stop jobs,
etc. It also provides a GUI display of the current status of the
pipeline executor.
- Pipeline Executor retrieves raw, calibration, and metadata for
the current observing session from the archive for processing.
- Alternate course: Pipeline Executor retrieves
data between user-defined break points in the current session.
- Pipeline Executor checks for and, if found, retrieves calibrated
uv data from previous sessions from the archive.
- Pipeline Executor asks Pipeline Heuristics Lookup to select
an appropriate Pipeline Application.
- Pipeline Heuristics Lookup uses information in the raw, calibration,
and metadata to select the appropriate standard mode for data reduction
(the Pipeline Application)
and to set some parameters at decision points in the Pipeline Application.
- Pipeline Heuristics returns appropriate Pipeline Application to
Pipeline Executor.
- Pipeline Executor manages the processing of the selected Pipeline
Application to calibrate and image the data.
- Alternate course: a second data product is produced using options
specified by the Observer.
- Exception course: the Operator is notified if processing fails.
- Pipeline Executor stores calibrated data, data cubes, quality
indicators, reduction scripts and logs to the Archive.
- Pipeline Executor reports to Scheduler that science processing for
this project is complete.
Alternate Course:
- The Observer initiates the Pipeline offline.
- The Pipeline executes script commands sequentially,
using previously obtained calibration data.
Postconditions:
- Science data (images) are written into the Science Archive. May
be an Observer specified product as well as a standard product.
- Data reduction scripts and logs are written into the Science Archive.
- Optionally: the Pipeline makes quality check results available to the
Scheduler.
- The Scheduler informs the Observer by email when science data processing
is complete.
Issues
to be Determined or Resolved:
- How often is this run per project? Just at the end of the project and
at breakpoints? Probably also at the end of each observing session in a
project. If we run it more than once per project, how long do we
store intermediate results?
If the intermediate results are truly redundant, and they can be reproduced
by the processing script (this can be tricky if all the software is not
archived), and it is possible to tell how the intermediate results helped
to produce the final results, they could be deleted. It might be simpler to
save everything produced by a valid pipeline processing request ...
- Does THIS Use Case need to be run at home institutes too, or just the
heuristics/application bit? If so, needs more work on the off-line
section to make it clearer where inputs are coming from.
LD. This is a bit of a TBD.
I would like the pipelining software to be exportable.
Although this is not stricly required by ALMA it may fall out of the
design and implementation.
- Does the Pipeline Heuristics Lookup need to use information in the
RAW data, or just the calibration
and metadata, to select the appropriate standard mode for data reduction?
This requirement was in Memo 11; we need to decide if it is really
necessary. Or can we determine the observing mode from
the high level metadata without extracting all the raw data?
Ditto for retrieving observing parameters and checking for validity.
- Should the Pipeline makes quality check results available to the
Scheduler, and if so, how? Currently quality checks are sent to
the archive only; is feedback to the scheduler useful given the time lag?
This requirement was in Memo 11. An example of where quality
checks might need to be fed back would be if a whole session turned
out to be bad for some undiagnosed reason. Then the Scheduler would
need to know that the scheduling blocks need to be observed again.
Notes:
- This Use Case was modified by C. Wilson to help
define Science Pipeline requirements. Relevant
SSR Use Case from SSR Memo 11 is 4.5.3 (Process Science Data)
by R. Lucas.
- The Science Data Pipeline shall be run either at or near the telescope,
or at the places where the Archives are kept.
- If the Pipeline of a previous Session is still active, the Pipeline of
the current Observing Session has priority
for allocation of computing resources.
- An Observer must be able to look at the Pipeline results of recently
observed Programmes without downgrading the Pipeline performance on the
currently observed Programme.
Last modified: 12aug03