Collected Comments on Pipeline/Offline Requirements Document V2.0 (and some before 21Jun01 on v1.x). This document is online at http://www.aoc.nrao.edu/~smyers/alma/offline-req/olr-v2.0-comments.txt ------------------------------------------------------------------------------- 1-May-2001: V1.2 ------------------------------------------------------------------------------- From bclark@aoc.nrao.edu Fri Jul 13 10:26:20 2001 Date: Thu, 14 Jun 2001 12:48:49 -0600 (MDT) From: Barry Clark To: bclark@cv3.cv.nrao.edu, bglenden@cv3.cv.nrao.edu, gueth@iram.fr, morita@nro.nao.ac.jp, momose@mito.ipc.ibaraki.ac.jp, lucas@iram.fr, schilke@mpifr-bonn.mpg.de, tatematsu@nro.nao.ac.jp, smyers@cv3.cv.nrao.edu Cc: bclark@zia.aoc.NRAO.EDU Subject: Re: f.y.i. - an integrated version of the Pipeline/Offline doc The Schilke & Guth document seems to have a very different mental image about how things work than I have. My mental image is that on the completion of a scan, something examines whether it was a calibration scan, and if so, invokes one or more of a number of scripts (aka pipelines) which reduce the observation, insert the results into a calibration archive, and optionally alert the sequencer of their existence. The imaging or quick-look pipelines are invoked at more stately intervals, and both go through a (probably identical) stage of extracting data from the calibration archive and constructing the detailed gain tables to use to make a first image. Schilke & Guth seem to want to do the gain table construction on a scan-by-scan basis. This is not an unreasonable way of doing things (though one may want to go back and remake the whole lot after the last polarization and flux calibration observations), but I do think that making the gain tables should be clearly separated from the reduction of the calibrator observations, to make management of the priorities reasonably clean - calibrator observations *must* be reduced immediately, whereas if gain tables don't get made for a while, it's no big deal. If we go this route, we need yet another name for a pipeline, to separate them. The calibration scripts (aka pipelines) I listed in my E-Mail some months ago were: calibrateTsys (loadswitched data) calibrateSidebandRatio calibrateFlux calibrateBandpass calibratePhase calibratePointing calibrateFocus To which I would add something to accumulate polarization calibration information; the complete reduction is not possible until all observations are in. ------------------------------------------------------------------------------- From gueth@iram.fr Fri Jul 13 10:23:05 2001 Date: Sun, 17 Jun 2001 15:10:50 +0200 From: Frederic Gueth To: K. Tatematsu Cc: Steven T. Myers , Barry Clark , Brian Glendenning , Koh-Ichiro MORITA , momose@mito.ipc.ibaraki.ac.jp, Robert Lucas , schilke@mpifr-bonn.mpg.de, tatematsu@nro.nao.ac.jp Subject: Re: another draft v1.4 "K. Tatematsu" wrote: > > Dear all, > > Thanks for your temporary summary work, Steve. > Some more input... > > Cheers, > Ken > > At 21:54 01/06/14 -0600, Steven T. Myers wrote: > >Section 2: Pipeline Data Processing Requirements > > > > 2.2 Single-Dish data > > -------------------- > > > > 2.2-R1 The Calibration Pipeline shall reduce the atmospheric > > calibration, and pass the results to the dynamic Scheduler. > > > > 2.2-R2 For all observations of an astronomical source, the Calibration > > Pipeline shall apply the atmospheric calibration to the data. > > > > 2.2-R3 The Calibration Pipeline shall reduce and pass the results to > > the Sequencer: > > 2.2-R4 For the pointing and focus measuremets, the fitting results > should be automatically stored in the telescope > parameter file if the fitting error is less than > the user/observatory specified value. If the error > is not less than the specified value, > the pipline will send a message to the alarm system. This is typically what we meant by writting that the quick-look pipeline shall be able to detect any bad data and give an alarm if necessary. Whether it's a job for the calibration or the quick-look pipeline is another question. In the current document, the idea is that the calibration pipeline is reducing the data, and the quick-look pipeline is taking the results to do plots, images, and alarms. But the alarms shall be detected and signaled as fast as possible. Frederic. ------------------------------------------------------------------------------- From gueth@iram.fr Fri Jul 13 10:24:29 2001 Date: Sun, 17 Jun 2001 15:09:16 +0200 From: Frederic Gueth To: Barry Clark Cc: bclark@cv3.cv.nrao.edu, bglenden@cv3.cv.nrao.edu, morita@nro.nao.ac.jp, momose@mito.ipc.ibaraki.ac.jp, lucas@iram.fr, schilke@mpifr-bonn.mpg.de, tatematsu@nro.nao.ac.jp, smyers@cv3.cv.nrao.edu, bclark@zia.aoc.NRAO.EDU Subject: Re: f.y.i. - an integrated version of the Pipeline/Offline doc Barry Clark wrote: > > The Schilke & Guth document seems to have a very different mental image > about how things work than I have. > > My mental image is that on the completion of a scan, something examines > whether it was a calibration scan, and if so, invokes one or more of a > number of scripts (aka pipelines) which reduce the observation, insert > the results into a calibration archive, and optionally alert the sequencer > of their existence. The imaging or quick-look pipelines are invoked at > more stately intervals, and both go through a (probably identical) stage > of extracting data from the calibration archive and constructing the > detailed gain tables to use to make a first image. Schilke & Guth > seem to want to do the gain table construction on a scan-by-scan basis. > This is not an unreasonable way of doing things (though one may want to > go back and remake the whole lot after the last polarization and flux > calibration observations), but I do think that making the gain tables > should be clearly separated from the reduction of the calibrator observations, > to make management of the priorities reasonably clean - calibrator observations > *must* be reduced immediately, whereas if gain tables don't get made for > a while, it's no big deal. If we go this route, we need yet another name > for a pipeline, to separate them. > > The calibration scripts (aka pipelines) I listed in my E-Mail some months > ago were: > calibrateTsys (loadswitched data) > calibrateSidebandRatio > calibrateFlux > calibrateBandpass > calibratePhase > calibratePointing > calibrateFocus > To which I would add something to accumulate polarization calibration > information; the complete reduction is not possible until all observations > are in. I think that there are three kinds of calibrations that could be handled by a "calibration pipeline": - The instrumental calibration: pointing, focus, delay, baseline, etc. What is required here is a fast feedback to the control software. - The calibrations that do not require a time interpolation, as the atmospheric or bandpass calibration: each time such a scan is observed, something has to be derived and then stored, to be applied to all the following observations, until a new calibration of that kind is observed. - The calibrations that require a time interpolation, ie the phase and amplitude calibration: a calibration curve has to be fitted using all available calibrations and then applied to all the source observations that were observed in between. The two first categories can easily be handled by a calibration pipeline. As for the third category, it is not yet clear to me which pipeline should do the job. In the document I sent a few days ago, all three pipelines are doing something in this area, and I agree it is not clear enough. I think that the science pipeline should do a clean job and derive the calibration curves using all data. But the calibration and quick-look pipelines should also do a similar calibration, to get an estimate of the phase rms and to produce quick images. Frederic. >>>SMyers: I will add the 3 categories to the beginning of the pipeline section.<<< ------------------------------------------------------------------------------- 21-June-2001: V2.0 ------------------------------------------------------------------------------- From bclark@aoc.nrao.edu Thu Jun 28 14:21:59 2001 Date: Thu, 21 Jun 2001 21:33:55 -0600 (MDT) From: Barry Clark Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements One case where AIPS and AIPS++ are sadly deficient is in polarization calibration. We need support for all the possibilities Steve brings up in his use cases, and, as far as I know, they aren't there. We need to specifically call them out to make sure they get there. ------------------------------------------------------------------------------- From tcornwel@cv3.cv.nrao.edu Thu Jun 28 14:22:17 2001 Date: Fri, 22 Jun 2001 08:23:09 -0600 From: Tim Cornwell Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Cc: Tim Cornwell , Athol Kemball Subject: RE: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements Barry Clark wrote: > > One case where AIPS and AIPS++ are sadly deficient is in polarization > calibration. We need support for all the possibilities Steve brings up > in his use cases, and, as far as I know, they aren't there. We need to > specifically call them out to make sure they get there. To put it mildly, I'm surprised at this statement. The polarization capabilities in AIPS++ are quite different from those in AIPS, and as far as I can see, support the possibilities outlined in Steve's uses cases. - The formulation used is that of Hamaker-Bregman-Sault. The development of this formalism by AIPS++ is outlined in a series of AIPS++ notes, particularly 182 onwards: http://aips2.nrao.edu/daily/docs/notes/notes/notes.html - A basic description of the overall calibration system is available at the ADASS VI proceedings. http://www.cv.nrao.edu/adass/adassVI/cornwellt.html Another more recent reference is the chapter on synthesis calibration in the AIPS++ document "Getting Results in AIPS+" http://aips2.nrao.edu/daily/docs/gettingresults/gettingresults/gettingresults.html - Polarization processing is not a special case but is designed in from the beginning in data structures and algorithms. The data format is defined in: http://aips2.nrao.edu/daily/docs/notes/229/229.html - The Jones matrix is the fundamental calibration term. The AIPS++ calibrater tool solves for and applies Jones matrices. All appropriate calibration terms are stored in Jones or Mueller matrix form. Interpolations are of these forms. The Jones matrices may be parametrized and solutions derived with respect to these parameters. For the format of calibration tables see: http://aips2.nrao.edu/daily/docs/notes/240/240.html - The formalism is independent of polarization type (R,L,X,Y,elliptical) and can work with circular, linear, or any mix (though we have not tried this latter possibility). - Fully correct (non-linear) D-term solutions (time-variable or fixed) are of course available. - Models of the sky can be either via polarized components or via polarized images. (Parenthetically, the image plane polarization analysis procedures in AIPS++ are excellent: see the Image Analysis chapter in Getting Results.) - The formalism and implementation explicitly allows for correction of polarized primary beams: for example, the mosaicing software, as implemented now, can correct for the R-L beam squint of the VLA antennas. - Solution for (polarized) primary beams is accomodated in the formalizm and implementation but we have not yet pursued the difficult algorithmic problem of solving for a parametrized primary beam. - Complex polarization dirty images (i.e. the XX, XY, YX, YY images at WSRT) can be made. If members of this committee wish to know more about the capabilities of AIPS++, I'd recommend browsing Getting Results. I'm also willing to answer questions, of course. Tim Cornwell AIPS++ Project Manager ------------------------------------------------------------------------------- From Wim.Brouw@atnf.csiro.au Thu Jun 28 14:23:48 2001 Date: Sun, 24 Jun 2001 12:37:55 +1000 From: Wim Brouw Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements I had a quick look at the Pipeline document. I hope that comments from outside the ssr are allowed. The comments are not exhaustive, but just on some points that caught my eye while reading. At Thu, 21 Jun 2001 17:09:33 -0600 (MDT) "Steven T. Myers" wrote: ... > We distinguish three different pipelines, the Calibration, the Quick-Look, > and the Science Pipeline. The Calibration pipeline is intended for > processing of array calibration data, usually on short turnaround > time-scales, with feedback to the online system and into the archive. > The Quick-Look pipeline has the job of providing quasi-realtime > (~minutes) or short turnaround-time ( data-quality assessment for feeback to the online system and to the > observers, and possibly output to the archive. The Science pipeline > is the primary data path from the array to the archive and to the > observer, usually operating on longer timescales to produce results > after breakpoints and after completion of projects. [We should put > a reference here to other documents describing this] Essential, especially since here the pipelines are described as more or less independent enntities; but later on e.g. the Science pipeline uses the calibration pipeline to derive a.o. bandp[as calibrations (should they not be paart of the ALMA derived calibration data?). ... > 1.0-R2 All corrections applied shall be recorded so that any step can be > reversed and redone if needed. > Recording of correctioposn not sufficient: not all corrections can e.g. be applied commutative. Hence in addition to corrections also model used (and history thereoff) should be recorded. Would it not be much better to use the normally used scheme of never changing the input data, but apply corrections 'on-the-fly' in one form or another (with maybe some intermediate dataset if and when necessary). This would also solve the big problem on how to cater for the interrelation between say data flags and corrections (not) applied. >>>SMyers: This is an implementation issue. The requirement should be the general availability of "undo" or "redo" without reloading of data or undue loss of previously done steps. This req should be rephrased to reflect this. <<< ... > 2.0-R1 The Calibration Pipeline shall be activated after each scan has > been observed. > > 2.0-R2 The Calibration Pipeline may also be re-invoked at any time with > updated parameters or improved data. The results should not > immediately overwrite old results so comparison is possible > before adopting the new calibration. There will need to > be a method for validation and acceptance of calibration > updates. > Why is tehre the logical difference between R1 and R2 ? Or should "The results.." be a separate R3? Also, why are there parameters necessary for R2, but not for R2? >>>SMyers: R2 was my addition. The Calibration pipeline will necessarily be activated when a calibration scan is observed. It may also be activated by staff later on (perhaps with an updated calibration script or tool) to improve the results from a previous run. Some mechanism must be there to have some validation and acceptance of the new results. R2 was perhaps too detailed. <<< > 2.1 Interferometric data > ------------------------ > > 2.1-R1 The Calibration Pipeline shall reduce, and store the following > results for further use: > > R1.1 the receiver sideband ratio calibration > R1.2 the atmospheric calibration > > The results of the atmospheric calibration shall be passed > to or made available for access by the Dynamic Scheduler > (in real-time mode). Is atmospheric calibration WVR? If so, is deriving the atmospheric corrections from WVR not something much tighter coupled to WVR and related hardware? I could easily imagine also that the timescale for deriving this data is different for that of the actual output data time scale. That means that correction cannot be undone, and the only way is to record both corrected and uncorrected data (not a choice, since later in science pipeline you say:"use corrected data...") > 2.1-R2 For all observations of an astronomical source, the Calibration > Pipeline shall: > > R2.1 apply the atmospheric calibration to the data > R2.2 store the phase corrected from the atmospheric effect, if > required Why only phase mentioned? Is it better to talk about data (meaning complex data) throughout when appropriate? ... > 2.1-R3 For all observations of a calibrator source, the > Calibration Pipeline shall: > > R3.1 compute the phase rms on the scan timescale > R3.2 compute the antenna efficiencies, using the averaged > amplitudes > R3.3 do the previous operations both with and without the > atmospheric phase correction, and deduce from the > comparison whether the atmospheric phase correction > improves the results or not I read this as on a per-baseline base (since 'observations are outputs of calibrators). Would it not be much better to do this on a per telescope basis: can find any wrong atmospheric correction; can try to closure phase (and get a real good idea about errors); can use more elaborate models for filed under scrutiny.. > R3.4 derive amplitude and phase time-dependent variations by > fitting smoothed curves (e.g. polynomials, splines) > using all observations of calibrators since the beginning > of the session > Are you averaging over data or over amplitude/phase separately; and why? ... > 2.1-R4 The Calibration Pipeline shall reduce the following > observations: > > R4.1 pointing scans (results to be passed to the Sequencer) > R4.2 focus measurments (results to be passed to the Sequencer) > R4.3 delay calibration (results to be passed to the Sequencer) > R4.4 bandpass calibration > R4.5 baseline calibration > R4.6 holography measurement Very dangerous to leave this not open ended (there is e.g. not a single receiver calibratiuon in the list, which could easily have to be done, especially at higher frequencies; or polarization -- which always has to be done). Deciding now already on which goes to sequencer and which not is a dangerous business for a project of >10 years. Would it not be much better to design a general calibration data interface that can be used for now and later? (more than one of course, depending on typoe of correction, but less than total number of corrections). Also, no mention at all is made of any re-use of data for various calibration schemes (expensive photons), or, even more important in my opinion, coupled (or iterative) solutions (how to gte polarization leakage without gain calibration simultaneously; maybe focus and pointing are correlated; can bandpass be looked at without looking at gain/phase and delay errors?). Look at packages like Newstar, Miriad and the aips++ resulting one that can cater for all this (basically by starting from an appropriate telescope/platform model -- the measurement equation). ... > 2.2-R5 The calibration pipline shall derive the > half-power beam size, the main-beam > efficiency, and the Moon (fss) efficiency from the calibration > scans towards planets and the Moon, and store > the successful results in the telescope parameter file. > > Another derived parameter is the total forward efficiency > obtained from skydip measurements. > and another... and another.. : too detailed (cannot be complete at this stage) > 3.0 Quick Look pipeline > ----------------------- > > 3.0-R1 The Quick Look pipeline shall be activated after the Calibration > Pipeline has been completed. > > 3.0-R2 A Monitoring Tool shall be available, plotting and archiving in a > log file various results of the Calibration Pipeline: > > R2.1 the results of the last pointing or focus scan > R2.2 the phase rms computed over the last scan and computed over the > current session > R2.3 the corresponding seeing > R2.4 the atmospheric opacity > ... > > This tool shall include a variety of options, to control the plot > parameters, to plot the variation of these results with time, to > allow the operator to monitor one antenna or baseline in > particular, etc. > It seems a bad idea to build your own tool. There are many commercial packages available (to mind comes the HP industrial monotoring package) that can do all you want here. Other telescopes must also have packages that are divers enough to enable this. >>>SMyers: That is not TBD here. All that matters is that such a tool is available. <<< ... > 3.0-R3 A Monitoring Tool shall be available to plot the current properties > of the array, such as: > > R3.1 the current instantaneous uv coverage This ms? s? scan? observation? > R3.2 the corresponding weight distribution ? per UV cell? how define UV cell? > R3.3 the corresponding dirty beam > R3.4 the previous quantities, integrated since the beginning of the > session > R3.5 the thermal noise rms reached since the beginning of the session For what? Baseline? antenna? set of baselines? > ... > > 3.0-R4 Single-Dish data: the current spectra observed on the astronomical > target shall be corrected from the emission at a reference position > or frequency (depending on the observing mode), and displayed with > various options: > > R4.1 time integration > R4.2 antenna summation > R4.3 baseline fit, excluding a pre-defined window, or a window > defined by the Operator or AoD Why not auto windowing? especially with robust fitting techniques this should be possible in 99% of cases > R4.4 spectra on a pseudo-grid corresponding to position on a raster > (a "stamp" or "profile" plot) > > 3.0-R5 Interferometric data: the visibilities observed on a target source > shall be calibrated, using the results of the Calibration Pipeline: > > R5.1 apply the current bandpass calibration How, if that is calculated in science pipeine later? > R5.2 apply the current amplitude and phase correction I would make this 'corrections' (and 'complex gain corrections') > R5.3 apply the flux conversion factor based on standard antenna > efficiencies > > 3.0-R6 Interferometric data: the current spectra observed on the > astronomical target shall be displayed (amplitude and phase) with > various options: > > R6.1 time integration Over complex data or ampl/phase separately (and why) > R6.2 choice of the baseline(s) with 2000 baselines need them probably ordered per baseline length and binned > to be able to say something sensible; probably even a kind of percentile > coloring or so to highlight problem points > R6.3 baselines summation over baseline or time (how averaged) > R6.4 intensity (amp or phase) as function of baseline and time > (for a frequency), or time and frequency ( for a baseline ) > > 3.0-R7 Interferometric data: the Quick Look Pipeline shall compute the > Fourier Transform of the visibilities, using the fastest algorithm, > and display the resulting image. Alternatively, the actual Fourier > Transform of each new visibility point can be computed and added to > the current image. This shall be done for: > The last is better option in general on-line. > R7.1 the continuum data > R7.2 the line-averaged spectra, over a pre-defined velocity range, > or possibly a velocity range defined by the Operator/AoD Again, I would use auto line detection windows to average (even if this means > say a delay of half a scan or so) > > 4.0 Science Pipeline > -------------------- > > 4.0-R1 The Science Pipeline shall be activated after completion of a > session. > > 4.0-R2 The Science Pipeline shall find in the Archive all data observed I though scoence pipeline produced archive? >>>SMyers: ALMA archives raw (and/or online-corrected) data to the pipeline. The Science pipeline both inputs from and outputs to the archive. <<< > during the session. It shall use the atmospheric-calibrated data > (amplitude and phase). What if the observer selected raw data (this option is mentioned in other documents (I do not agree with it, but it is stated)) > > 4.1 Interferometric data > ------------------------ > > 4.1-R1 The Science Pipeline shall use the calibrator to derive: > This only looks at a single type of observing mode by mentioning 'the calibrator shall ...' There are observing modes that could be used (a simple one is some monitoring observation that uses the ALMA determined calibration parameters straight off; also somewhere else the calibartion object is described as being able to detrmine if and when a calibration shall be done). I think R1 is not correct here > R1.1 the bandpass calibration > R1.2 the best phase and amplitude solution > > 4.1-R2 The Science Pipeline shall calibrate the source observations by > applying: > ... by applying either the best set of corrections available, or a user selectable set (and drop the next 3) > R2.1 the bandpass calibration > R2.2 the phase calibration > R2.3 the amplitude calibration ... > 4.1-R6 Special cases shall be supported, including: > > R6.1 mosaic observations > R6.2 on-the-fly mosaics > R6.3 self calibration projects > R6.4 combination of single-dish + ALMA data (+ACA) > > Comment: Careful cross calibration of the flux scales between > ALMA interferometric data and single dish data ( and ACA ) > is required for high fidelity imaging. This will require > careful coordination with the calibration pipeline, especially > as ACA observations may be taken at very different times than > the main array data. > It is more than just cross-calibration to get to high dynamic range. I would think that couple self-calibration could be required (or ...) > 4.1-R7 Subtraction of continuum level from spectral data is > required. This can be done in both Fourier and image > domain. In the case of uv-plane subtraction, flexible > setting of the frequency channel ranges for the calculation > of the continuum level should be available. Why flexible in UV domain (or why only in UV domain)? ... > 5.0 Interface with the Archive --- TO BE DETAILED > ------------------------------ > > 5.0-R1 The images produced by the Science Pipeline shall be archived, > together with the > > R1.1 the script that was used to produce the image A script is not sufficient; you have to know the version of the components used in the script (a script is only the 'glue' between 'objects' (or so)). > R1.2 the log file of the software I suppose this is the output log of this run, not the revision log? >>>SMyers: Yes, that was my intention here. Should rephrase.<<< > > 5.0-R2 cf 7.0-R3 general SSR document > > 5.0-R3 Also to be archived: > > R3.1 data quality control: > > R3.1.1 estimate of the noise > R3.1.2 seeing > R3.1.3 image fidelity based on model? > > R3.2 observation quality control: > R3.2.1 baseline quality > R3.2.2 calibration quality > > R3.3 telescope state: (possibly in monitor file, but accessible) > R3.3.1 telescope pointing > R3.3.2 subreflector focus > R3.3.3 monitor point (e.g. temperatures) data > > Appendix: Barry Clark's list of input parameters needed for each procedure > --------------------------------------------------------------------------- > > Where should we really put these? I guess an appendix to this section is > fine. I do not know the purpose of this liost. If it is an indication of what kind of parameter data is at least needed for certain operations, I can understand that. I would in general prefer to see a more encapsulated description, with a number of parameter objects (e.g. a DeconvolutionParameterObject; a calbration parameterObject,...). In that way: - easily re-used in various 'scripts' - easily moved as messages between methods (and updated) - easy to cater for coupling between parameters, and calculation of many based on one input) - non-varying interface possible >>>SMyers: Question still stands - should we keep this appendix here? My inclination is to leave it out or move to an appendix at the end of the entire document.<<< ... > 1.0 General Requirements and Interaction with other ALMA elements > 1.1 Goals of the Offline Package > 1.1-R1 An ALMA Offline Data Reduction Package (or "the package") > is primarily intended to enable end-users of ALMA (e.g. > observers or archive users) to produce scientifically > viable results that involve ALMA data products. The secondary > use is to enable ALMA staff to assess the state of the > array and derive calibration parameters for the system. > 1.1-R2 The package should be able to function (be installed) at > the users home institution, in addition to operating at The 'in addition ... seems superfluous (it also gives the wrong impression of priority to end users). >>>SMyers: 'and' is better<<< > ALMA regional centers (both locally and remotely). It should > be portable to a reasonable number of supported platforms, > including laptops without network connections. > 1.1-R3 The performance of the package should be quantifiable and > commensurate with the data processing requirements of > ALMA output at a given time. This should be benchmarked > (e.g. "AIPSmarks") and reproduce accurately results for > a fiducial set of reduction tasks. > 1.1-R4 The offline data reduction package should not suck. suck??? > 1.2 Relation to the Pipeline > 1.2-R1 All modules available in the pipline must be available also > as an offline analysis option. Note that not all offline > analysis tools will be in the pipeline package. Should be stronger. When time progresses, most of the offline stuff has to be done in the pipeline aswell (at least if some degree of on-line reduction and assessment is to be done): the requirements for front-line science after creaming-off the first results will be high sensitivity and dynamic range >>>SMyers: Without suggested text I have no idea of what Wim is looking for.<<< >>>SMyers: Upon further reflection, I dont think it will be fruitful for us to over-specify at this time what will go into the archive via the Science Pipeline. It is likely that the be-all and end-all of images wont be within our capability, and it may be better to allow "Archival" programs by users (like HST) to do real science on the archive and not do that ourselves. I am inclinde to leave this as it stands.<<< > 1.2-R2 One of the important differences between pipeline and > offline reduction path is that offline one should have > extensive capabilities to merge and compare data with different > resolution, coordinate system, data grid, and so on. > It seems to me wrong to exclude that from the outside for the science pipeline ("which has available all archived data ..."). E.g. single dish data. >>>SMyers: It is not the intention here to restrict the pipeline. Rephrase as 'may include extensive' etc., with R2 starting with the "Note that..." from R1.<<< ... > 2.4 Interface programming, parameter passing and feedback > 2.4-R1 Must have basic programming facilities such as: > > R1.1 variable assigment and evaluation > R1.2 conditional statements > R1.3 control loops > R1.4 string manipulation > R1.5 user-defined functions and procedures > R1.6 standard mathematical operations > Important aspects missed I think (why go in so much detail): vector handling; complex numbers; DO standard calls, persistent objects, ... >>>SMyers: Im not sure what the latter two are and I doubt users care. I think we need examples here to show the intent. However, if we cannot define something like a minimal set, only a representative list, is this still useful?<<< > 2.4-R2 Commands executed should be logged, with provision to > re-execute the session. > 2.4-R3 Input parameter checking upon parsing with reporting of > incorrect, suspicious or dangerous choices should be > done before execution where possible. > 2.4-R4 Parameters should be passable between applications in as > transparent a manner as possible. However, global parameters > should not be the default, unless chosen specifically by the > user-programmer. A long-term system will have a large multitude of parameters (including things like the distance to the SUN etc). If these parameters are what is called 'global' here, I disagree. Many parameters will not change for years on end, and parameters will be added constantly. For the average user these parameters are of no interest (if he/she would know what they are anyway). They are basically 'hidden' parameters in the sense of the next paragraph. These parameters should have a global value (default) value if you do not want to drive any observer crazy. >>>SMyers: No, thats not what I meant. I meant that if you define some variable such as, heaven forbid, APARM(1), it wont persist across functions by default! Specific "parameters" like your system-wide ones would be specifically designated as global (with some protection?). I guess I mean "variables". <<< ... > 3.2 Data import and export > 3.2-R1 The FITS/UVFITS data format and/or other commonly supported > standards must be supported for both input and output > without loss of functionality or information, though > need not be the native format for both the package and archive. By the time ALMA is operational, XDF, FITSML, etc will, I think, be the standard data exchange types, and preferable to non-described formats like UVFITS. You should be able to read 'old' formats (why not mention HDL etc as well? , but preferably produce new formats. >>>SMyers: I have no idea what these are. I guess I had no real idea that UVFITS was anything other than a flavor of FITS. I will mention only FITS. Someone else who knows more about this should craft the requirement. What flavor is IDI for example? I will add something about the project specifying which formats will be supported.<<< > 3.2-R2 Access to the archive must be supported, including for data > from the currently active observing session. Security and > integrity of the archive must be ensured during these > operations. > 3.2-R3 Disk and offline data storage (eg. DAT, DDS, DLT) must be > supported. Again dangerous to mention types (no CDrom and DVDs are mebntioned!). Why not drop this one R3, and make it: - Internet and offline import/export media for the major systems used in the partner countries m ust be supported. >>>SMyers: Again examples are useful, and I doubt that CDrom or DVD will be of sufficient storage volume to be useful. The project must decide on supproted media.<<< > 3.2-R4 The ability to ignore flagged data on export should be > included. ??? Ignore flag, or skip data (a bad idea I think) >>>SMyers: I mean "drop flagged data" instead of propagating flags. In general not a good idea but often useful when careful.<<< > 3.3 I/O speed and efficiency > 3.3-R1 I/O of data must not be a bottleneck for processing, especially > for pipeline use. This is especially true if the native format > of the package is not used and filling/conversion is necessary. > The definition of what constitutes a "bottleneck" and what > I/O throughput rate is acceptable must be defined at each stage > of ALMA operations (eg. interim science, full stand-alone ALMA, > ALMA + ACA) and in each mode (eg. quick-look pipeline, offline > use). For offline use, the intention is that users not be > faced with I/O operations that are way out of line with the > fastest equivalent times that could reasonably be achieved > with software development. What is meant here? >>>SMyers: We need to ensure that the speed of the package is up to some standard. I dont know any other way to word it. Maybe we should establish benchmarks for the package...<<< > 3.4 ALMA interferometer data > 3.4-R1 Correlation products accumulated at multiple bit depths > (16-bit,32-bit) must be supported transparently > 3.4-R2 On-line gain correction data must be carried along with > data > 3.4-R3 Calibration tables and editing information must be associated > with the data and preserved on output > 3.5 ALMA single dish and phased-array data + unphased array(?) - how are you going to phase? >>>SMyers: this is done online (eg. for VLBI) Should it not be better to state: - Data taken in any of the ALMA hardware available modes should be suppoorted in the most appropriate manner (e.g. and give one example). - Data from non-ALMA telescope (single dish or array) should be usable if provided in a standard data exchange format >>>SMyers: I think it is better to delineate the modes we know about explicitly where possible. However a blanket requirement should be added as 3.1-R1. The second regarding foreign data is dealt with as 3.7-R1.<<< And, in my view even more important, since we want to make sure that ALMA will be used to advance astronomy in general, and not as a tool for some mm-wave astronomy specialists: - ALMA shall produce its observed data and processed data (images, spectra) in a world-wide accepted standard data exchange format, which can be accessed by general display and viewing packages (e.g. IRAF, AIPS++, ...). >>>SMyers: this sounds more like a requirement on ALMA (the pipeline?) than on the offline package. I had though about adding 3.1-R2 The package shall produce its observed data and processed data (images, spectra) in a world-wide accepted standard data exchange format, which can be accessed by general display and viewing packages (e.g. IRAF, AIPS++, ...). but have decided 3.2-R1 is sufficient for the offline package.<<< ... > 4.0 Calibration and Editing ... > 4.1-R3 Data display and editing should be effected through generic > tools applicable to both single-dish and interferometer modes. > These should present should, as far as possible, present > similar interfaces to the user and have the same look-and-feel. Ther are generic tools available in packages and tools, which could easily be re-used. In my view this re-use aspect (with similar look-and-feel across the whole astronomical observational spectrum) is in the longer term much more important than a similar look and view within a very small user community. Ifd you want Dimap style, use Difmap; etc.. >>>SMyers: That sounds nice, but how does that apply to the offline package? And is this really important? Note that the only sure way to ensure uniformity across the spectrum is to have a single supplier or consortium, and I dont think the users would benefit from Microsoft-ish dominance either. <<< ... > 4.2-R6 Determination of, correction for, and examination of closure > errors should be straightforward to carry out. This belongs in the calibration pipeline. ... > 5.1-R4 Astrometric accuracy must be preserved over phase-calibration > distances of a few degrees. Statements like these should be quantified exactly or not have any numbers >>>SMyers: Nonsense. The intention is clear (or at least can be made clear) and would not benefit from saying "10 degrees". Perhaps "at least 5 degrees?" though that doesnt do it right either. The need is to have it deal with all reasonable switching distances.<<< > 5.1-R5 Images made on the different equinox (e.g. B1950 and J2000) > or different coordinate (RA,DEC and l,b) system or > different projection (tangent, sinusoidal, ...) can be merged and compared appropriately. > 5.1-R6 Data cubes using different velocity definition (optical or radio > definition for Doppler velocity) must be merged appropriately. I think there is a mix-up between labelling and contents of axes. Correlators will produce data as a function of frequency only why change that? the way data is labelled does not change the data. Why would you (for a single map) make a new datacube with equal spacing in some other coordinate? >>>SMyers: actually the correlator will do so as a function of lag (at least for the baseline correlator). Im not sure what is meant here (need a line pundit) but I think it has to do with translation of (possibly foreign) data.<<< > 5.2 Interferometer imaging > 5.2-R1 High-fidelity imaging of the entire primary beam in all > Stokes parameters is the primary goal - therefore, > incorporation of the polarized primary beam response of the > array is required. > 5.2-R2 Imaging must deal seamlessly with mosaiced data, with proper > gridding in the uv-plane and compensation for primary beam > effects and pointing in such a manner as to mitigate the > effects of non-coplanar baselines and sky curvature. A > variety of options for gridding and beam correction should > be available at user request. > 5.2-R3 There must be seamless integration of data from multiple > epochs and configurations > 5.2-R4 There must be the ability to include short-spacing data > taken in single-dish mode (both ALMA and non-ALMA data) > 5.2-R5 Subtraction of continuum level from spectral data is required. > This can be done in both the Fourier and image domain. > In the case of uv-plane subtraction, flexible setting of the > frequency channel ranges for the calculation of the continuum > level (graphically as well as CLI) should be available. There must be the possibility to create 3D imgaes for rotating objects (see e.g. Miriad). >>>SMyers: I assume you mean like a planet? Add as 5.2-R6.<<< ... ------------------------------------------------------------------------------- From twillis@drao.nrc.ca Thu Jun 28 14:36:05 2001 Date: Sat, 23 Jun 2001 21:32:28 -0700 (PDT) From: Tony Willis Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) > > 1.0-R2 All corrections applied shall be recorded so that any step can be > > reversed and redone if needed. > > > Recording of correctioposn not sufficient: not all corrections can e.g. be > applied commutative. Hence in addition to corrections also model used (and > history thereoff) should be recorded. Would it not be much better to use the > normally used scheme of never changing the input data, but apply corrections > 'on-the-fly' in one form or another (with maybe some intermediate dataset if > and when necessary). This would also solve the big problem on how to cater > for the interrelation between say data flags and corrections (not) applied. > I agree completely with Wim's comments here. Radio astronomy data reduction is not the same thing as word processing. Having to handle reversals and undo's etc could easily double the size of your system. If you don't like your result, just rerun the job with modifications to those input parameters that you think will lead to improvement. >>>SMyers: This is an implementation issue. It will probably be done (at least if aips++ or simiar is used) by successive application of tables. However, it would be bad if you had to save a copy of the entire dataset at a given state to get back to that state, that is if the tables arent sufficient. It this we need to require... Try to reword req 1.0-R2 to that effect.<<< > > 2.1-R5 User-understandable and non-destructive error handling at > > all levels is highly desirable. > > 2.1-R6 Multiple levels of "undo" should be supported for all tasks. Ditto here. >>>SMyers: it is the interpretation of the implementation of "undo" that is the problem. Perhaps "undo" conjures too specific a picture, reword...<<< Tony -- Tony Willis Internet : Tony.Willis@hia.nrc.ca Snailnet : Dominion Radio Astrophysical Observatory P.O. Box 248, Penticton, BC, Canada V2A 6K3 BC Tel net: (250) 493-2277 Faxnet : (250) 493-7767 voicemailnet: (250) 490-4343 Localnet : ext 343 ------------------------------------------------------------------------------- From tcornwel@cv3.cv.nrao.edu Thu Jun 28 14:36:53 2001 Date: Tue, 26 Jun 2001 11:09:33 -0600 From: Tim Cornwell Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: RE: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements I have some comments on this draft based on my relevant experience in a number of roles: - Someone involved in research on various types of calibration and imaging including mosaicing - As AIPS++ Project Manager since 1995 - As someone responsible for NRAO's end-to-end processing needs. 0. A point concerning scope. AIPS++ is about 150 FTE-years. AIPS is probably about the same. The ESO Data flow system is about 300 FTE-years (I believe). I would guess from some communications that for the items described in this requirements document, the ALMA computing division has between 40 and 60 FTE-years (depending on how one counts various things). I would counsel that you spend that effort wisely. I think the current draft overspends by a large factor. >>>SMyers: Not our problem, except in assigning priorities (many of the over-specified areas should be better in the next draft). If the package designers think that specific parts designated high-priority will be too expensive, then they should back that up with numbers and propose the relaxation of specific requirements to the project.<<< 1. A general comment is that data reduction splits into strategy and tactics. The tactics come from the basic physics but the strategy comes from experience. I think the document is mostly fine on tactics but is a little too specific about some strategies. The items on the calibration pipeline seem to me to fit in this category. For example, 2.1-R3 is a strategy that may or may not work in all situations. >>>SMyers: True, but its not clear we can write a pipeline document without giving at least some specific (example) operations. To be discussed in Berkeley. By the way, it think this general issue of specific vs. general is important to get sorted out early on (I would have preferred to do this before writing even this much).<<< 2. It's hard to know how to process data for a ground-breaking telescope like ALMA. I think one should be modest in setting forth too-definitive statements of how the processing should proceed. In this context, I think the tool-based approach using in AIPS++ is vital, and I would advocate including a statement aimed at this point. >>>SMyers: "Tool-based approach" is meaningless (except maybe to everyone but me). I think the requirements cover the building blocks of the aips++ approach, but if there is a specific statement you advocate inclusion of, we would consider it.<<< 3. I haven't followed your discussions in detail so I'm not at all sure what General Consideration B means. In what way is there a fundamental distinction? I could not see how this consideration affected the rest of the document. It's also a very dangerous point since in many operations, one obviously wants no distinction. >>>SMyers: Thats the point of B. There was a proposal to break the requirements down more by single-dish vs. interferometer than it is. This is an explanation of why. Note that most of these "general considerations" will be removed in the final doc. I just wanted somewhere to put some discussion of why the doc looks the way it does.<<< 4. There are some prescriptive implementation details that should be removed (e.g. 3.0-R7 "using the fastest algorithm", also the Appendix of Barry Clark's input parameters). >>>SMyers: At least in this stage of the document it is useful to have examples of implementations. We will have to delineate prescriptive and descriptive items if we wish to keep them in. As for 3.0-R7, Im not sure what the authors wanted here.<<< 5. I am surprised that the document has relatively few requirements that are operational in nature. For example: >>>SMyers: good idea. Some operational issues are located elsewhere, make a new OL-1.3 for this.<<< - Be installation-flexible: can be installed on non-specialized hardware by end user >>>SMyers: OL-1.3-R1 <<< - Processing script must be re-executable with only a small number of changes >>>SMyers: I dont know what this means. A script, once executable, should always be executable. Unless you mean under later versions of the software. I'll pretend this is what you mean as OL-1.3-R2 <<< - Process standard recurring observations and analyze according to standard recipes >>>SMyers: Do we really need to say that? At some level those are things we are specifying as "examples" of the operations in this document and for which we get criticized as being too naive!<<< - Provide real-time feedback via standard compact displays and plots >>>SMyers: add to GUI as OL-2.2-R1<<< - Be operable automatically or manually >>>SMyers: already under interface as OL-2.1-R1<<< - Allow preemption, termination, resubmission, etc. >>>SMyers: with proper error handling & recovery, OL-1.3-R3<<< 6. I found some of the discussion hard to understand. An example is 3.3-R1: Everything but the first sentence is unnecessary and detracts from the simplicity of the requirement. >>>SMyers: 3.3-R1 contains too much discussion, granted. We need to settle on a simple requirement text. Throw me a bone here...<<< 7. A major point that applies to all my remaining comments is that it's easy to write simple sounding requirements that either double, triple, etc the software costs or prevent any estimation at all. Wim and Tony pointed out that adding undo is one example. I'd also add a substantial number of others: "1-R1: The pipelines shall be able to process all data coming from the array." For all arrays that I know of, one can think of observations that "break the bank" of available computing. This must be true of ALMA as well. Do you really want to limit the array in this way or specify the pipeline so aggressively? "4.2-R1: The data taken on the astronomical source shall be reduced, depending on the observing mode. All possible modes shall be supported: R1.1 etc" I think only the enumerated modes should be supported. Only known things can be guaranteed to be supported. >>>SMyers: True. Reword. But we cannot enumerate all modes in this document - the project will need to maintain a list and part of the negotiation for the package will be to agree to a mechanism to deal with this evolving list.<<< "1.1-R4 The offline data reduction package should not suck" Harder than you would think. I think the software costs for this are unknown. >>>SMyers: The costs will likely suck also.<<< "2.3-R4 All functionality of the {CLI,GUI} must be supported in {GUI,CLI} mode" (Note some numbering problems here). This is much, much harder than you would think, and is a waste of resources, especially in the era when UIs are evolving so quickly. >>>SMyers: In principle, there should be underlying functionality that is accessible, with varying degrees of simplicity (eg. point-and-click to delete a point vs. specify a visibiltity in the CLI). The relation between the GUI and glish in aips++ is an example. I think this is important to maintain as much as possible (and has implications for pipelining).<<< "3.2-R1 The FITS/UVFITS data format....without loss of functionality or information" UVFITS will lose information. A dump of a data format to FITS binary tables is probably what is needed. >>>SMyers: UVFITS is gone.<<< "4.1-R1 The package must be able to reliably handle all of the proposed and future ALMA calibration modes" This (future modes) is of course impossible to guarantee, and bad practice to specify(!) >>>SMyers: reword.<<< "4.2-R9 Determination of polarization...." Why would you want linearized solutions, except to save time? If so, say that it's allowed to save time. >>>SMyers: if its allowed, its allowed. By the way, this is a "standard recipe" such as you wanted us to write in as a requirement earlier.<<< "4.4-R3 The complex polarization response of the telescope beams must be calibratable (though this is mainly an imaging step)" Some research is needed here: how would one model the response? This has considerable impact on the processing, especially if the responses differ significantly from antenna to antenna. This also goes to requirement 5.2-R1. >>>SMyers: indeed. I dont know a better way to word these more speculative requirements (originally there was going to be a special flag for these). Suggestions welcome.<<< "7.1-R4 The output of the display should be possible in many different formats..." No, I think you choose one and let other software (e.g. ImageMagick) do any conversion. That's what we do in AIPS++: we write xpm and recommend that people use a converter of which there are plenty. >>>SMyers: bad idea. Thats why we should specify this here.<<< "8.1-R3 The speed of the simulator must be commensurate.....". While one may require this, it may not be doable. Simulation is hard and can be very computationally expensive. In some cases, the simulator may have to run in the pipeline (using parallel code). >>>SMyers: As stated, there may have to be different "simulators" depending on the problem complexity and timescale - there will have to be at least a quick simulator for the obstool for example.<<< That's it. Tim ------------------------------------------------------------------------------- From lucas@iram.fr Thu Jun 28 14:37:20 2001 Date: Tue, 26 Jun 2001 20:12:01 +0200 From: Robert Lucas Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements Some more comments: > 4.2 Single dish data > -------------------- > > 4.2-R1 The data taken on the astronomical source shall be reduced, > depending on the the observing mode. All possible modes > shall be supported: All supported observing modes shall be supported for data reduction... >>>SMyers: reworded, see above.<<< ... > 1.2 Relation to the Pipeline > 1.2-R1 All modules available in the pipline must be available also > as an offline analysis option. Note that not all offline > analysis tools will be in the pipeline package. That's very important. For instance atmospheric models are progressing, many calibration devices will be aailable some of which will be used by the pipeline, one may wish to reprocess tha atmospheric calibration with improved or reevaluated atmospheric data. We have done this several times at Plateau de Bure. So the atmospheric calibration procedures (like a full atmospheric model) should be available in the off-line package. >>>SMyers: agreed.<<< ... > 2.3 Command Line Interface (CLI) > 2.3-R1 The CLI must be useable remotely over low-speed modem lines > or network connections, with ACSII terminal emulation. > 2.3-R2 The interface must have the facility to read in command files > for batch processing of a sequence of CLI commands. > 2.3-R3 The CLI should have command-line recall and editing > 2.3-R4 All functionality of the GUI must also be available in CLI > mode. Do not we require something like a minimum degree of user-friendlyness? Please do not forget the biochemist! I think that based on the dataset and its status the user should be proposed a list of possible operations to be done on the data with comments on the results that they are supposed to give. >>>SMyers: Is this above what is required in 2.5 Documentation and Help?<<< ... > 3.2 Data import and export > 3.2-R1 The FITS/UVFITS data format and/or other commonly supported > standards must be supported for both input and output > without loss of functionality or information, though > need not be the native format for both the package and archive. UVFITS is I guess deprecated. to not put it at the same level as general FITS. >>>SMyers: gone. Mea culpa.<<< ... > 3.5 ALMA single dish and phased-array data > 3.5-R1 Data taken with nodding secondary must be supported, as > a function of nodding phase I've seen secondaries vibrating, nutating and wobbling but not yet nodding. (note: one of the senses is: to incline or sway from the vertical as though ready to fall....) >>>SMyers: Nutation - Etymology: Latin nutation-, nutatio, from nutare to nod, rock.<<< to be continued ... -- Robert LUCAS, Institut de Radioastronomie Millimetrique 300 rue de la Piscine, F-38406 St Martin d'Heres Cedex (FRANCE) Tel +33 (0)4 76 82 49 42 Fax +33 (0)4 76 51 59 38 E-mail: mailto:lucas@iram.fr http://iram.fr/~lucas/ ------------------------------------------------------------------------------- From smyers@cv3.cv.nrao.edu Thu Jun 28 14:40:36 2001 Date: Tue, 26 Jun 2001 12:11:01 -0600 (MDT) From: Steven T. Myers Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: ALMA Science Software Working Group Subject: [alma-sw-ssr] comments on v2.0 Pipeline and Offline draft Its great to see the comments. Keep them coming! It would be a great help if commenters suggest new or replacement requirement text, rather than just comments. Some of the overly wordy "requirements" that some of you have commented on are the result of getting a long comment and not knowing what to do with it, and so I more or less included the comment as the requirement. If it isnt obvious from the comment what to do (such as "remove this requirement" or "delete the word XXXX" or "replace XXXX with YYYY") then include your revision of the requirement or the text of the new requirement. Thanks, -Steve ------------------------------------------------------------------------------- From twillis@drao.nrc.ca Thu Jun 28 14:41:10 2001 Date: Tue, 26 Jun 2001 11:52:49 -0700 (PDT) From: Tony Willis Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) > 4.1-R6 Special cases shall be supported, including: > > R6.1 mosaic observations > R6.2 on-the-fly mosaics > R6.3 self calibration projects > R6.4 combination of single-dish + ALMA data (+ACA) > Why are these called Special cases? I would have thought they should be Standard cases. >>>SMyers: "designated modes"?<<< Tony ------------------------------------------------------------------------------- From twillis@drao.nrc.ca Thu Jun 28 14:41:37 2001 Date: Tue, 26 Jun 2001 12:19:22 -0700 (PDT) From: Tony Willis Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: RE: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) Tim wrote: > "8.1-R3 The speed of the simulator must be commensurate.....". > > While one may require this, it may not be doable. Simulation is hard and > can be very computationally expensive. In some cases, the simulator may have to run > in the pipeline (using parallel code). > > That's it. > > Tim > >From the document: >> 8.0 Simulation >> 8.1 General simulation requirements >> 8.1-R1 There must be simulation capability for interferometer and >> single dish observation with ALMA in all modes, for planning >> (with the ObserveTool) and comparison of data with models >> (for editing and correction). These should include error >> generation for thermal noise, pointing, primary beam, >> atmosphere, antenna surface errors, etc. >> 8.1-R2 The output of the simulator must be compatible with the >> rest of the offline package, and with the ALMA pipeline. >> It should be available in all ALMA data format(s). >> 8.1-R3 The speed of the simulator must be commensurate with the >> desired feedback time. For instance, if used with the >> real-time-system to assess quality the simulator must >> respond in minutes, if used for proposer feedback for >> ObsTool application it should feedback also on minute >> timescales for most simple experiments, while for complicated >> engineering simulations it may be allowed to take >> correspondingly longer. >> 8.1-R4 The simulator should be available early in the software >> production cycle in order to use it to test other components >> of the package. One of the goals of the Canadian proposal is to indeed build a simulator capable of emulating the data rate from the actual ALMA telescope. Initially (2001 - 2002 etc) - yes, it would have to run on some massively parallel architecture, but luckily, radio interferometers have lots of 'embarassingly parallel' components. So I believe it can be done, and should be done early in the software cycle as proposed above. But no, it will not run on someone's laptop as proposed for ALMA software in 2.1-R7. >>>SMyers: Some aspects of simulation will have to run quickly by users (eg. in the obstool). A monolithic simulator is a bad idea, IMO.<<< I believe a simulator is critical to the success of ALMA. The VLA was luckily saved from being an expensive boondoggle by the development (after the fact) of self-calibration by Tim and others. A 'real telescope' simulator will allow the investigation and solution of ALMA imaging problem long before the telescope is turned on. (I will give aips++ a plug by saying that it has excellent tools for easily developing massively parallel applications although the larger astronomical community has little awareness of the potential use of aips++ in this area.) Tony ------------------------------------------------------------------------------- From twillis@drao.nrc.ca Thu Jun 28 14:42:16 2001 Date: Tue, 26 Jun 2001 16:59:03 -0700 (PDT) From: Tony Willis Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) Some more thoughts: > 5.0 Interface with the Archive --- TO BE DETAILED > ------------------------------ > > 5.0-R1 The images produced by the Science Pipeline shall be archived, > together with the > > R1.1 the script that was used to produce the image > R1.2 the log file of the software > > 5.0-R2 cf 7.0-R3 general SSR document > > 5.0-R3 Also to be archived: > > R3.1 data quality control: > > R3.1.1 estimate of the noise > R3.1.2 seeing > R3.1.3 image fidelity based on model? > > R3.2 observation quality control: > R3.2.1 baseline quality > R3.2.2 calibration quality > > R3.3 telescope state: (possibly in monitor file, but accessible) > R3.3.1 telescope pointing > R3.3.2 subreflector focus > R3.3.3 monitor point (e.g. temperatures) data Do the raw observed data end up in the archive? I assume so. Or is that requirement given in another document? > 1.0 General Requirements and Interaction with other ALMA elements > 1.1 Goals of the Offline Package > 1.1-R1 An ALMA Offline Data Reduction Package (or "the package") > is primarily intended to enable end-users of ALMA (e.g. > observers or archive users) to produce scientifically > viable results that involve ALMA data products. The secondary > use is to enable ALMA staff to assess the state of the > array and derive calibration parameters for the system. Surely this secondary use is more a real-time or near real-time requirement? > 1.1-R2 The package should be able to function (be installed) at > the users home institution, in addition to operating at > ALMA regional centers (both locally and remotely). It should > be portable to a reasonable number of supported platforms, > including laptops without network connections. The 3 -> 60 Mb / sec data rate from ALMA is comparable to the data rate assumed by Tim Cornwell for the EVLA (EVLA memo 24). He calculates that you will still need at least a $20,000 to $100,000 (2000 dollars) computer system in 2009 to handle that amount of data. So you will need a very expensive laptop. I do not think that having a requirement that the ALMA offline system run on laptops is realistic. > 2.1-R6 Multiple levels of "undo" should be supported for all tasks. > 2.1-R7 The interface and package should function without a network > connection (e.g. a laptop on an airplane). Ditto here. Tony ------------------------------------------------------------------------------- From smyers@cv3.cv.nrao.edu Thu Jun 28 14:42:38 2001 Date: Tue, 26 Jun 2001 18:48:20 -0600 (MDT) From: Steven T. Myers Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) On Tue, 26 Jun 2001, Tony Willis wrote: > Some more thoughts: > > > 5.0 Interface with the Archive --- TO BE DETAILED > > ------------------------------ > Do the raw observed data end up in the archive? I assume so. Or is that > requirement given in another document? Thats in the first Requirements document (what should we refer to it as in this document by the way?). The default is that raw data and WVR corrected data are both archived. > > 1.0 General Requirements and Interaction with other ALMA elements > > 1.1 Goals of the Offline Package > > 1.1-R1 An ALMA Offline Data Reduction Package (or "the package") > > is primarily intended to enable end-users of ALMA (e.g. > > observers or archive users) to produce scientifically > > viable results that involve ALMA data products. The secondary > > use is to enable ALMA staff to assess the state of the > > array and derive calibration parameters for the system. > > Surely this secondary use is more a real-time or near real-time requirement? Hard to say. I think the staff (and members of this group!) will be using the package manually to look at test data the day after observation or even later, for example. > > > 1.1-R2 The package should be able to function (be installed) at > > the users home institution, in addition to operating at > > ALMA regional centers (both locally and remotely). It should > > be portable to a reasonable number of supported platforms, > > including laptops without network connections. > > The 3 -> 60 Mb / sec data rate from ALMA is comparable to the data > rate assumed by Tim Cornwell for the EVLA (EVLA memo 24). He calculates > that you will still need at least a $20,000 to $100,000 (2000 dollars) > computer system in 2009 to handle that amount of data. So you will need > a very expensive laptop. I do not think that having a requirement that > the ALMA offline system run on laptops is realistic. The users should be able to reduce their data wherever they are. I don't see how the peak and sustained data rates enter into this --- thats for the Pipeline primarily, and for the input into the Science archive (Im assuming thats the Pipeline also). A user should be able to reduce a 12-hour dataset (some spectral line mode) on a desktop or laptop system in 2007. > > > 2.1-R6 Multiple levels of "undo" should be supported for all tasks. > > 2.1-R7 The interface and package should function without a network > > connection (e.g. a laptop on an airplane). > > Ditto here. > and ditto here too. Unless my assumption that the Offline requirements are primarily for end users is way off... -Steve ------------------------------------------------------------------------------- From tcornwel@cv3.cv.nrao.edu Thu Jun 28 14:43:09 2001 Date: Tue, 26 Jun 2001 20:49:16 -0600 From: Tim Cornwell Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: RE: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) Tony Willis wrote: > The 3 -> 60 Mb / sec data rate from ALMA is comparable to the data > rate assumed by Tim Cornwell for the EVLA (EVLA memo 24). He calculates > that you will still need at least a $20,000 to $100,000 (2000 dollars) > computer system in 2009 to handle that amount of data. So you will need > a very expensive laptop. I do not think that having a requirement that > the ALMA offline system run on laptops is realistic. What I calculated was the average rate for a few cases. There is a spectrum of possible observational scenarios, and undoubtedly some of scenarios will be reducible on a laptop/PDA/wristwatch and it will make sense to do so. So I do think that the requirement is realistic. I just cannot see that there would be any doubt that any package could run on a laptop. It's already trivial for even the biggest packages so why worry? >>>SMyers: I worry about all things software :-( <<< Tim ------------------------------------------------------------------------------- From lucas@iram.fr Thu Jun 28 14:43:34 2001 Date: Wed, 27 Jun 2001 10:24:12 +0200 From: Robert Lucas Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) Tim Cornwell wrote: > > Tony Willis wrote: > > > The 3 -> 60 Mb / sec data rate from ALMA is comparable to the data > > rate assumed by Tim Cornwell for the EVLA (EVLA memo 24). He calculates > > that you will still need at least a $20,000 to $100,000 (2000 dollars) > > computer system in 2009 to handle that amount of data. So you will need > > a very expensive laptop. I do not think that having a requirement that > > the ALMA offline system run on laptops is realistic. > > What I calculated was the average rate for a few cases. There is a spectrum > of possible observational scenarios, and undoubtedly some of scenarios will be > reducible on a laptop/PDA/wristwatch and it will make sense to do so. So > I do think that the requirement is realistic. I just cannot see that > there would be any doubt that any package could run on a laptop. It's > already trivial for even the biggest packages so why worry? > > Tim I think that it's still reasonable to require that if data reduction of a good fraction of projects is feasible off-line with the cpu and memory available on a laptop, then it should not be restricted by other issues (expensive(>0?) licences, complicated installation procedures ...). >>>SMyers: Make part of Tim's suggested operational issues OL-1.3<<< -- Robert LUCAS, Institut de Radioastronomie Millimetrique 300 rue de la Piscine, F-38406 St Martin d'Heres Cedex (FRANCE) Tel +33 (0)4 76 82 49 42 Fax +33 (0)4 76 51 59 38 E-mail: mailto:lucas@iram.fr http://iram.fr/~lucas/ ------------------------------------------------------------------------------- From lucas@iram.fr Thu Jun 28 14:43:59 2001 Date: Wed, 27 Jun 2001 10:27:41 +0200 From: Robert Lucas Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements(fwd) "Steven T. Myers" wrote: > > Do the raw observed data end up in the archive? I assume so. Or is that > > requirement given in another document? > > Thats in the first Requirements document (what should we refer to it as > in this document by the way?). The default is that raw data and WVR > corrected data are both archived. ALMA-SW_MEMO 11 at http://www.alma.nrao.edu/development/computing/docs/joint/0011/ssranduc.pdf Robert -- Robert LUCAS, Institut de Radioastronomie Millimetrique 300 rue de la Piscine, F-38406 St Martin d'Heres Cedex (FRANCE) Tel +33 (0)4 76 82 49 42 Fax +33 (0)4 76 51 59 38 E-mail: mailto:lucas@iram.fr http://iram.fr/~lucas/ ------------------------------------------------------------------------------- From guillote@iram.fr Thu Jun 28 14:44:19 2001 Date: Wed, 27 Jun 2001 10:31:31 +0200 From: Stephane Guilloteau Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) > >I think that it's still reasonable to require that if data reduction of >a good fraction of projects is feasible off-line with the cpu and memory >available on a laptop, then it should not be restricted by other issues >(expensive(>0?) licences, complicated installation procedures ...). > One of the key difference between the laptop and a "normal" computer is the screen size. This requirement has more implication on the user interface than on the data reduction engines. Stephane ------------------------------------------------------------------------------- From lucas@iram.fr Thu Jun 28 14:44:36 2001 Date: Wed, 27 Jun 2001 13:00:12 +0200 From: Robert Lucas Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements Hi: - Tim, please remember this is a draft far away from the final thing; we are just starting to discuss this in the whole SSR group! I tried to reply to some of Tim's comments in order to clarify a few points. Tim Cornwell wrote: > 0. A point concerning scope. AIPS++ is about 150 FTE-years. AIPS is > probably about the same. The ESO Data flow system is about 300 FTE-years > (I believe). I would guess from some communications that for the items > described in this requirements document, the ALMA computing division > has between 40 and 60 FTE-years (depending on how one counts various > things). I would counsel that you spend that effort wisely. I think the > current draft overspends by a large factor. Remember that these requirements will be used as input to a re-use analysis, and that the ALMA FTE's should be used only to help fill in the remaining gaps, and build a pipeline. > 1. A general comment is that data reduction splits into strategy > and tactics. The tactics come from the basic physics but the > strategy comes from experience. I think the document is mostly > fine on tactics but is a little too specific about some strategies. > The items on the calibration pipeline seem to me to fit in this > category. For example, 2.1-R3 is a strategy that may or may not > work in all situations. The main motivation here is to feed back the results to the dynamic scheduling and data acquisition processes. We believe e.g. that if these tests do not work, the data cannot be calibrated, and we switch to another less demanding activity. This is a first guess strategy based on experience with existing mm-wave arrays. > 2. It's hard to know how to process data for a ground-breaking > telescope like ALMA. I think one should be modest in setting > forth too-definitive statements of how the processing should > proceed. In this context, I think the tool-based approach using > in AIPS++ is vital, and I would advocate including a statement > aimed at this point. Clearly the wise attitude is not to spend all forces in the first version, but keep a good part of them for when we have real high frequency data to play with! That's in the planning of the sw group I think. > 3. I haven't followed your discussions in detail so I'm not at all > sure what General Consideration B means. In what way is there a > fundamental distinction? I could not see how this consideration > affected the rest of the document. It's also a very dangerous point > since in many operations, one obviously wants no distinction. The difference is in the on source data acquisition naturally (total power or interferometry) but the calibration may use data taken in either single-dish or interferometry, which may require to share calibration data between single-dish and interferometry software. > 4. There are some prescriptive implementation details that should > be removed (e.g. 3.0-R7 "using the fastest algorithm", also > the Appendix of Barry Clark's input parameters). For the quick look speed matters of course (as the name says). > 5. I am surprised that the document has relatively few requirements > that are operational in nature. For example: > > - Be installation-flexible: can be installed on non-specialized > hardware by end user > - Processing script must be re-executable with only a small > number of changes > - Process standard recurring observations and analyze according > to standard recipes > - Provide real-time feedback via standard compact displays > and plots > - Be operable automatically or manually > - Allow preemption, termination, resubmission, etc. Good comment, but please remember this is only a draft on which we are working! > 6. I found some of the discussion hard to understand. An example > is 3.3-R1: Everything but the first sentence is unnecessary and > detracts from the simplicity of the requirement. It might be wise to separate the actual requirements from our motivations in writing them (which is useful for a live document). > 7. A major point that applies to all my remaining comments > is that it's easy to write simple sounding requirements that > either double, triple, etc the software costs or prevent any > estimation at all. Wim and Tony pointed out that adding undo > is one example. I'd also add a substantial number of others: > > "1-R1: The pipelines shall be able to process all data coming from > the array." > For all arrays that I know of, one can think of observations that > "break the bank" of available computing. This must be true of > ALMA as well. Do you really want to limit the array in this way > or specify the pipeline so aggressively? Of course it depends how you define `process'. This is sort of restricted by 1.0-R7 in our general requirements document. > "4.2-R1: The data taken on the astronomical source shall be reduced, > depending on the observing mode. All possible modes shall be > supported: > R1.1 etc" > > I think only the enumerated modes should be supported. Only known > things can be guaranteed to be supported. I agree, rephrasing needed. > "1.1-R4 The offline data reduction package should not suck" > > Harder than you would think. I think the software costs for this > are unknown. I thought the author had inserted that one as an simple e-mail generator. >>>SMyers: it worked!<<< Robert -- Robert LUCAS, Institut de Radioastronomie Millimetrique 300 rue de la Piscine, F-38406 St Martin d'Heres Cedex (FRANCE) Tel +33 (0)4 76 82 49 42 Fax +33 (0)4 76 51 59 38 E-mail: mailto:lucas@iram.fr http://iram.fr/~lucas/ ------------------------------------------------------------------------------- From momose@mito.ipc.ibaraki.ac.jp Thu Jun 28 14:44:59 2001 Date: Wed, 27 Jun 2001 20:46:38 +0900 From: Munetake MOMOSE Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements Hi Folks, Several comments just before the telecon... Sincerely, M. Momose ----------------- 1. Calibration with Pipelines I think it is still unclear the difference in "calibration procedure" between Calibration/Quick-look Pipeline and Science Pipeline. Personally, I agree with the following Frediric's comments (June 17th): >I think that there are three kinds of calibrations that could >be handled by a "calibration pipeline": >- The instrumental calibration: pointing, focus, delay, baseline, > etc. What is required here is a fast feedback to the control > software. >- The calibrations that do not require a time interpolation, as > the atmospheric or bandpass calibration: each time such a > scan is observed, something has to be derived and then stored, > to be applied to all the following observations, until a new > calibration of that kind is observed. >- The calibrations that require a time interpolation, ie the > phase and amplitude calibration: a calibration curve has to be > fitted using all available calibrations and then applied to all > the source observations that were observed in between. > > The two first categories can easily be handled by a calibration > pipeline. As for the third category, it is not yet clear to me which > pipeline should do the job. In the document I sent a few days ago, all > three pipelines are doing something in this area, and I agree it is not > clear enough. I think that the science pipeline should do a clean > job and derive the calibration curves using all data. But the > calibration and quick-look pipelines should also do a similar > calibration, to get an estimate of the phase rms and to produce > quick images. If the above will be the case, it may be sufficient to support only a few simple modes in Calibration & Quick-look Pipelines for just quick calibration / real-time monitoring (e.g., baseline-base solutions with linearly interpolated calibration curve), while the archival data generated by Science Pipeline are reduced in some optimum mode that is selected among various options. 2. about Simulator (Section 3 -8) My opinion is that a simulator that generates a probable resultant map for some model brightness distribution will be quite beneficial to the end users. However, the one that simulates the whole things (complete instrumental behavior as well as environmental condition) will be so complicated that most observers cannot handle it, though it might be useful in checking technical issues. We should therefore discuss the optimum specs of the simulator for end users. (Discussion about a simulator for system check or maintenance is beyond the scope of this group, I guess.) >>>SMyers: I have delineated levels of simulation in the latest (12-Jul) version.<<< 3. Offline Visualization (Section 3 -7) I agree that Offline package should be able to deal with more than two image-files of some standard format to make composite / multi-layered maps. But importing purely-graphic files (such as JPEG or postscript format: see 7.2-R1) to produce composite maps should NOT be required to the offline package, because these files do not have any header information such as reference positions, observing frequencies , and so on. Although this is a general feature of graphic software (e.g., Photoshop, Canvas ...), but not of the astronomical reduction package. I therefore propose to revise 7.2-R1 as follows: User should be able to produce overlays of different data sets of standard formats. It should be possible to place these data sets in layers which can be switched on and off separately. The different images should be editable, and it should be possible to declare certain colors transparent. It must be possible to shift, rotate and scale the images at will. >>>SMyers: Good text, I have used this. It is a great help when replacement text is provided!<<< ------------------------------------------------------------------------------- From twillis@drao.nrc.ca Thu Jun 28 14:45:22 2001 Date: Wed, 27 Jun 2001 06:01:04 -0700 (PDT) From: Tony Willis Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) > > > > The 3 -> 60 Mb / sec data rate from ALMA is comparable to the data > > rate assumed by Tim Cornwell for the EVLA (EVLA memo 24). He calculates > > that you will still need at least a $20,000 to $100,000 (2000 dollars) > > computer system in 2009 to handle that amount of data. So you will need > > a very expensive laptop. I do not think that having a requirement that > > the ALMA offline system run on laptops is realistic. > > The users should be able to reduce their data wherever they are. I don't > see how the peak and sustained data rates enter into this --- thats for > the Pipeline primarily, and for the input into the Science archive (Im > assuming thats the Pipeline also). A user should be able to reduce a > 12-hour dataset (some spectral line mode) on a desktop or laptop system > in 2007. > Perhaps I misrepresented myself here - I agree with Tim's comments that the software package should run on a lap top - indeed I run the ACSIS software system on my laptop - its great for software development (and for system testing with a tiny 128 channel spectral line system from a single receiver!). So you might be able to reduce a small snapshot on a 2007 laptop. However a 12-hour dataset at even the rather modest data rate of 4 Mb per second sums to 173 Gb after 12 hours. I suspect that if you are attempting to process this amount of data in a laptop while waiting for your airplane at an airport you may need quite a large collection of batteries! Anyway why be explicit about computing devices at this stage? For all we know, in 2007 - 2009 we may be using some kind of wireless screens with instant connect to some supercomputer. Tony ------------------------------------------------------------------------------- From guillote@iram.fr Thu Jun 28 14:45:40 2001 Date: Wed, 27 Jun 2001 15:15:41 +0200 From: Stephane Guilloteau Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) Excerpt from Tony Willis > Anyway why be explicit about >computing devices at this stage? For all we know, in 2007 - 2009 we >may be using some kind of wireless screens with instant connect to >some supercomputer. > That comes exactly back to my previous message: the key-point about laptop is the screen size, and the requirement may rather be written e.g. "Should be able to (conveniently) run the data processing user interface from a laptop" >>>SMyers: added to 2.1-R7<<< Stephane ------------------------------------------------------------------------------- From schilke@mpifr-bonn.mpg.de Thu Jun 28 14:46:00 2001 Date: Wed, 27 Jun 2001 15:43:07 +0200 From: Peter Schilke Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements I am triggered to this mail by Momose-san's rejection of the requirement of possibility of importing "foreign" formats such as jpeg or such, because it is difficult. I don't want to harp on this particular issue in itself too much, but I think at present we shouldn't restrict ourselves too much by considerations of feasibility, we should define what we think we need. Reality checks will come in with assigning priorities and, ultimately, by considering the resources available. To stick with this example, I find it annoying that I have to jump constantly between packages to annotate or make overlays with jpeg, so there is this requirement. It will get the priority "desirable" which translates to "won't happen in your lifetime" in most cases - unless it can be done cheaply - and it might be, since we are talking about reusing existing software. If it's not in the requirements, it won't ever happen because nobody would know we want it. A similar argument could be (and has been made) regarding the "must run on laptop" requirement. So I'd be in favor of not exercising a priori censorship too excessively - of course it shouldn't get to the point where the important issues get lost in desiderata. >>>SMyers: I am in favor of outputting standard formats (jpeg, gif, ps) directly, but am less in favor of importing these. Actually FITS is probably our best format for import, and is relatively easy to convert other stuff to this format, of course with loss of header info. I am tempted, as in 3.7-R2 (12-Jul) to restrict import to standard formats like FITS.<<< Peter ------------------------------------------------------------------------------- From gueth@iram.fr Thu Jun 28 14:46:26 2001 Date: Wed, 27 Jun 2001 15:28:49 +0200 From: Frederic Gueth Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements Munetake MOMOSE wrote: > > Hi Folks, > > Several comments just before the telecon... > > Sincerely, > M. Momose > > ----------------- > > 1. Calibration with Pipelines > > I think it is still unclear the difference in "calibration procedure" > between Calibration/Quick-look Pipeline and Science Pipeline. Personally, > I agree with the following Frediric's comments (June 17th): > > >I think that there are three kinds of calibrations that could > >be handled by a "calibration pipeline": > >- The instrumental calibration: pointing, focus, delay, baseline, > > etc. What is required here is a fast feedback to the control > > software. > >- The calibrations that do not require a time interpolation, as > > the atmospheric or bandpass calibration: each time such a > > scan is observed, something has to be derived and then stored, > > to be applied to all the following observations, until a new > > calibration of that kind is observed. > >- The calibrations that require a time interpolation, ie the > > phase and amplitude calibration: a calibration curve has to be > > fitted using all available calibrations and then applied to all > > the source observations that were observed in between. > > > > The two first categories can easily be handled by a calibration > > pipeline. As for the third category, it is not yet clear to me which > > pipeline should do the job. In the document I sent a few days ago, all > > three pipelines are doing something in this area, and I agree it is not > > clear enough. I think that the science pipeline should do a clean > > job and derive the calibration curves using all data. But the > > calibration and quick-look pipelines should also do a similar > > calibration, to get an estimate of the phase rms and to produce > > quick images. I agree with Munetake (and with my previous email...), that the precise definition of the calibration pipeline is unclear. "Calibration" is a quite general concept which includes operations of a very different nature. I would like to mention two related problems: 1) The "telescope calibration" (pointing, etc) put constrains of a different nature than the other pipeline elements: until the results are available, ALMA is blocked and cannot observe! This implies an extremely fast answer, which is not necessarily the case for the other calibrations (the computation time can be longer, providing it is not a bottle-neck in the data flow). Thus, instrument calibration should have the highest priority (which is not clearly stated in the present document), but it can even call for a separate "telescope calibration" pipeline. 2) Some calibrations can be computed immediately after the corresponding scan has been observed (eg bandpass). The result can be stored and used to calibrate following observations. In that sense, it can easily be handled by the calibration pipeline described in the current document. BUT some calibrations can only be derived at the very end of the observations: this is typically the time-dependance phase and amplitude curves. So this is a job for the pipeline running at the end of the session, namely the science pipeline. Maybe we should distinguish between the 'calibration' and the 'imaging' part of the science pipeline? My point is not to split the 'pipeline' in an increasing number of entities but rather to identify some well-defined parts with clear inputs and outputs. Frederic. >>>SMyers: Incorporated into header for PL-2.0 <<< ------------------------------------------------------------------------------- From tcornwel@cv3.cv.nrao.edu Thu Jun 28 14:46:43 2001 Date: Wed, 27 Jun 2001 08:09:56 -0600 From: Tim Cornwell Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: RE: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) > > That comes exactly back to my previous message: the key-point about > laptop > is the screen size, and the requirement may rather be written e.g. > "Should be able to (conveniently) run the data processing user interface > from a laptop" My laptop, Dell Inspiron, has the best screen of any of my computers (1600x1400) and it'll only get better. I think there must be some other concern here about user interfaces that should be expressed directly. Regards, Tim ------------------------------------------------------------------------------- From tcornwel@cv3.cv.nrao.edu Thu Jun 28 14:47:05 2001 Date: Wed, 27 Jun 2001 08:38:23 -0600 From: Tim Cornwell Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: RE: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements > Hi: > > - Tim, please remember this is a draft far away from the final thing; we > are just starting to discuss this in the whole SSR group! OK. I apologise for perhaps being too strident. It seems like I've been discussing subjects like these for years :) Tim ------------------------------------------------------------------------------- From twillis@drao.nrc.ca Thu Jun 28 14:47:30 2001 Date: Wed, 27 Jun 2001 07:49:46 -0700 (PDT) From: Tony Willis Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) > My point is not to split the 'pipeline' in an increasing number of > entities > but rather to identify some well-defined parts with clear inputs and > outputs. > > Frederic. > There are ways to have multiple pipelines coexisting that are quite easy to implement. In fact, by taking this approach the overall complexity of pipeline "logic" might be reduced quite a bit. Tony ------------------------------------------------------------------------------- From twillis@drao.nrc.ca Thu Jun 28 14:47:57 2001 Date: Wed, 27 Jun 2001 07:52:54 -0700 (PDT) From: Tony Willis Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: [alma-sw-ssr] overall comment re pipelines document To me it reads to a certain extent as a great big wish list. I guess the next step would be to start setting the requests in a priority order. >>>SMyers: Indeed, this will be the primary task at the Berkeley meeting.<<< Tony ------------------------------------------------------------------------------- From guillote@iram.fr Thu Jun 28 14:48:24 2001 Date: Wed, 27 Jun 2001 17:19:35 +0200 From: Stephane Guilloteau Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) -----Original Message----- From: Tim Cornwell To: alma-sw-ssr@nrao.edu Date: Wednesday, June 27, 2001 4:10 PM Subject: RE: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements (fwd) >> >> That comes exactly back to my previous message: the key-point about >> laptop >> is the screen size, and the requirement may rather be written e.g. >> "Should be able to (conveniently) run the data processing user interface >> from a laptop" > >My laptop, Dell Inspiron, has the best screen of any of my computers (1600x1400) >and it'll only get better. I think there must be some other concern here about >user interfaces that should be expressed directly. > >Regards, > >Tim > Sorry, it's my eyes which don't follow the 1600x1400 over a 14-15 inch screen. And I doubt they'll ever become better... Stephane ------------------------------------------------------------------------------- From tcornwel@cv3.cv.nrao.edu Thu Jun 28 14:48:49 2001 Date: Wed, 27 Jun 2001 10:36:28 -0600 From: Tim Cornwell Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: RE: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements Robert wrote: > > I tried to reply to some of Tim's comments in order to clarify a few > points. > > > Tim Cornwell wrote: > > 0. A point concerning scope. AIPS++ is about 150 FTE-years. AIPS is > > probably about the same. The ESO Data flow system is about 300 FTE-years > > (I believe). I would guess from some communications that for the items > > described in this requirements document, the ALMA computing division > > has between 40 and 60 FTE-years (depending on how one counts various > > things). I would counsel that you spend that effort wisely. I think the > > current draft overspends by a large factor. > > Remember that these requirements will be used as input to a re-use > analysis, and that the ALMA FTE's should be used only to help fill in > the remaining gaps, and build a pipeline. I took that into account. Even if you build on top of another package, the current draft is too expensive. I know that the process of requirements/reuse analysis/costing will reveal this but I think a reality check now is possible and useful. >>>SMyers: It is ironic that most of the requirements objected to are taken nearly verbatim from the aips++ requirements memo from 1992. I think this demonstrates what impact the unprioritized wish-list of that document had. Was a requirements/reuse analysis/costing analysis done for the 1992 requirements to justify the choices aips++ made?<<< > > > 1. A general comment is that data reduction splits into strategy > > and tactics. The tactics come from the basic physics but the > > strategy comes from experience. I think the document is mostly > > fine on tactics but is a little too specific about some strategies. > > The items on the calibration pipeline seem to me to fit in this > > category. For example, 2.1-R3 is a strategy that may or may not > > work in all situations. > > The main motivation here is to feed back the results to the dynamic > scheduling and data acquisition processes. We believe e.g. that if these > tests do not work, the data cannot be calibrated, and we switch to > another less demanding activity. This is a first guess strategy based on > experience with existing mm-wave arrays. That's not the only point. The design of C++ library + high level scripting allows one to put detailed and well-known tactics into the library (via e.g. a measurement model for a telescope), and defer strategies for implementation in the scripting language. There are packages and systems that don't have this property and therefore it is worth specifying. > > > 4. There are some prescriptive implementation details that should > > be removed (e.g. 3.0-R7 "using the fastest algorithm", also > > the Appendix of Barry Clark's input parameters). > > For the quick look speed matters of course (as the name says). The fastest algorithm may require excessive disk space or have low precision or only powers of two or whatever.... The point is that calling out "fastest" as being the most important factor is not necessary. Tim ------------------------------------------------------------------------------- From bclark@aoc.nrao.edu Thu Jun 28 14:49:13 2001 Date: Wed, 27 Jun 2001 11:12:28 -0600 (MDT) From: Barry Clark Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements > > The subject of the Section 2 on Pipeline Requirements is referred to as the > "Pipeline". This may be implemented as disparate > tools or programs, or as separate packages provided by different groups, > or as a single package, as long as it fulfills the requirements. > > The subject of Section 3 on Offline Data Reduction Requirements is referred to > as the "Package" or "Offline Package". This may be implemented as disparate > tools or programs, or as separate packages provided by different groups but > integrated into a single suite, or a single package. > Perhaps instead: The "Package" or "Offline Package" is a set of tools or programs, believed adequate for ALMA reductions, and used by ALMA staff for reductions upon which the behavior of the system will be judged. It may consist of packages provided by different groups, with transitions provided to integrate them into a single suite. The requirements will state that the Package will be available for installation on the observer's own computer systems. The requirements on the Package are set forth in Section 3. A "Pipeline" is a set of operations, implemented by the underlying Package, which takes a concise description of the way these operations are to be performed and accesses ALMA data, either from the ALMA archive or from local files, and produces a desired data product. (For purposes of software requirements, the alternate definitions as a machine or set of machines, or as the supervisory process that invokes these operations are less useful.) There are several Pipelines essential to the efficient operation of ALMA. >>>SMyers: I have resisted the assumption that the pipeline is built using the offline Package. Although likely (eg. aips++) I think this is unduly restrictive.<<< \bullet The Calibration Pipeline operates in quasi real time, looks at only calibrator observations, and produces one or more of the following data products (depending on the type of observation and type of calibrator), and places the results in a location where they can be accessed both by the real-time system and by other reduction proceedures: 1) an antenna pointing offset for all antennas. 2). Tsys for all antennas as a function of time. 3.) Sideband ratios for all antennas. 4.) Antenna based flux calibration (TSYSJY) from a flux calibrator. 5.) Antenna based bandpass calibration. 6.) Antenna based polarization leakage terms (with the usual indeterminate offsets from a single observation). 7.) Antenna based IF phase differences (from a strongly polarized calibrator). 8.) Antenna based phase calibration (with noise and atmospheric rms). \bullet The Science Pipeline will process most science data. It's data product is an image cube. This product will in many cases be adequate to achieve the observer's science goals. It may access ALMA data from several observing sessions and even from observations not the observer's own. It is intended to produce the best image possible without the intervention of an expert observer. The Science Pipeline will include a data calibration phase, that may, in fact, run somewhat asynchronously with the image making phase; this should not be confused with the Calibration Pipeline above. \bullet The Quick Look Pipeline will process data from only one observing session, and will comprise a subset of the operations of the Science Pipeline. It will be sufficiently limited in its processing to produce results in a time short compared to the length of a typical observing session. It's data products (images) will usually be available while the session is still in progress. The requirements for these pipelines are set forth in Section 2. ------------------------------------------------------------------------------- From gueth@iram.fr Thu Jun 28 14:49:55 2001 Date: Thu, 28 Jun 2001 13:40:43 +0200 From: Frederic Gueth Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: [alma-sw-ssr] Definition pipelines Barry Clark wrote: > > > > > The subject of the Section 2 on Pipeline Requirements is referred to as the > > "Pipeline". This may be implemented as disparate > > tools or programs, or as separate packages provided by different groups, > > or as a single package, as long as it fulfills the requirements. > > > > The subject of Section 3 on Offline Data Reduction Requirements is referred to > > as the "Package" or "Offline Package". This may be implemented as disparate > > tools or programs, or as separate packages provided by different groups but > > integrated into a single suite, or a single package. > > > > Perhaps instead: > > The "Package" or "Offline Package" is a set of tools or programs, believed > adequate for ALMA reductions, and used by ALMA staff for reductions upon > which the behavior of the system will be judged. It may consist of packages > provided by different groups, with transitions provided to integrate them > into a single suite. The requirements will state that the Package will be > available for installation on the observer's own computer systems. The > requirements on the Package are set forth in Section 3. > > A "Pipeline" is a set of operations, implemented by the underlying Package, > which takes a concise description of the way these operations are to be > performed and accesses ALMA data, either from the ALMA archive or from > local files, and produces a desired data product. (For purposes of software > requirements, the alternate definitions as a machine or set of machines, > or as the supervisory process that invokes these operations are less useful.) > There are several Pipelines essential to the efficient operation of ALMA. > > \bullet The Calibration Pipeline operates in quasi real time, looks at only > calibrator observations, and produces one or more of the following data In the current draft, the calibration pipeline does not ignore the observations of the astronomical source: it applies (in the sense: store the appropriate quantity in the relevant header) the atmospheric calibration to all incoming observations. > products (depending on the type of observation and type of calibrator), and > places the results in a location where they can be accessed both by the > real-time system and by other reduction proceedures: 1) an antenna pointing > offset for all antennas. 2). Tsys for all antennas as a function of time. > 3.) Sideband ratios for all antennas. 4.) Antenna based flux calibration > (TSYSJY) from a flux calibrator. 5.) Antenna based bandpass calibration. > 6.) Antenna based polarization leakage terms (with the usual indeterminate > offsets from a single observation). 7.) Antenna based IF phase differences > (from a strongly polarized calibrator). 8.) Antenna based phase calibration > (with noise and atmospheric rms). The list should be left open in such a general description. For instance, the focus offset or the antenna positions derived from a baseline measurement are missing. > > \bullet The Science Pipeline will process most science data. It's data product > is an image cube. This product will in many cases be adequate to achieve the > observer's science goals. It may access ALMA data from several observing > sessions and even from observations not the observer's own. It is intended > to produce the best image possible without the intervention of an expert > observer. The Science Pipeline will include a data calibration phase, that > may, in fact, run somewhat asynchronously with the image making phase; this > should not be confused with the Calibration Pipeline above. To avoid confusion, I would suggest to change the "calibration pipeline" name to "real-time calibration pipeline". It can also be run off-line, but it *has* to be available in real-time. > > \bullet The Quick Look Pipeline will process data from only one observing > session, and will comprise a subset of the operations of the Science Pipeline. > It will be sufficiently limited in its processing to produce results in a > time short compared to the length of a typical observing session. It's > data products (images) will usually be available while the session is still > in progress. > The functions of each pipeline is summarized in the following list, in which "blocks" of operations are identified: Real-time calibration pipeline ------------------------------ - Data acquisition part - store in all incoming observation the current calibration paramaters (Tsys, bandpass, ...) - Telescope calibration - reduce array calibrations (pointing, focus, delay, baselines,...) - results are made available to the Sequencer - Astronomical calibration - reduce astronomical calibrations (atmopheric calibration, phase rms, flux scale, bandpass, ...) - results are made available to the Dynamic Scheduler Quick-look pipeline ------------------- - Monitoring tools - display the current properties of the array and/or observation - need results of real-time calibration pipeline - Calibration pipeline - from temperature-calibrated visibilities to uv tables (simplified calibration) - need results of real-time calibration pipeline - Imaging pipeline - from uv tables to images (simplified version) - need results of previous calibration pipeline - Display tools - display current observations, to allow the operator/AoD easy checks of the data quality - need results of previous calibration and/or imaging pipelines Science pipeline ---------------- - Calibration pipeline - from temperature-calibrated visibilities to uv tables - Imaging pipeline - from uv tables to deconvolved images Frederic. ------------------------------------------------------------------------------- From jschwarz@eso.org Thu Jun 28 14:50:24 2001 Date: Thu, 28 Jun 2001 14:57:30 +0200 From: Joseph Schwarz Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Definition pipelines Frederic Gueth wrote: > Real-time calibration pipeline > ------------------------------ > > - Data acquisition part - store in all incoming observation the current > calibration paramaters (Tsys, bandpass, ...) > > - Telescope calibration - reduce array calibrations (pointing, focus, > delay, baselines,...) > - results are made available to the Sequencer > What is likely to be the limiting factor, time to acquire the calibration data, or time to reduce it? Presumably baseline calibrations won't change during the execution of a Scheduling Block (unless the observing process is supposed to compensate for earthquakes in real time). Delay calibrations (according to the Use Cases, section 4.8.5 of the main requirements doc) are performed "at least once per receiver tuning" or "at least once per observing session" or (Lucas & Muders, private communication) "after reconnections of cables/fibres and after antenna moves". So while it's true that ALMA can't observe without these results, which certainly need to be known by the observing process (Sequencer?), there might be more time to produce them than the phrase "real-time" implies. As for pointing and focus, the Use Cases specify a "Pointing Session", which I understand results in a pointing model, but also a "Pointing Calibration", whose purpose is to update the parameters of that pointing model. The Pointing Session is an array- (or observatory-) level calibration which is done "after moving one or more antennas and/or at regular time intervals (weekly ?)", while the Pointing Calibration gets done fairly often. When we were generating the Use Cases, I had understood that there was no hard requirement on how quickly the results from the "Pointing Calibration" were needed: that an observing procedure could continue to execute even if the updates to the pointing and focus parameters weren't available for some time. How long this "some time" could be was never specified. It would be helpful for the analysis if this could be made a little clearer. > > - Astronomical calibration - reduce astronomical calibrations > (atmopheric calibration, phase rms, > flux scale, bandpass, ...) > - results are made available to the Dynamic > Scheduler > From prior discussions and from the Use Cases, I had understood that phase rms results would be made available to the observing process (not just to the Scheduler), so that an executing Scheduling Block could adjust cycle and dwell times on target and phase calibrator based on the results. Similarly, an SB might want to terminate once a certain noise level had been reached. Might not the time constraints be tighter than those on the telescope calibrations? ------------------------------------------------------------------------------- From lucas@iram.fr Thu Jun 28 14:50:47 2001 Date: Thu, 28 Jun 2001 15:48:29 +0200 From: Robert Lucas Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Definition pipelines Joseph Schwarz wrote: > > Frederic Gueth wrote: > > > Real-time calibration pipeline > > ------------------------------ > > > > - Data acquisition part - store in all incoming observation the > > current > > calibration paramaters (Tsys, bandpass, ...) > > > > - Telescope calibration - reduce array calibrations (pointing, > > focus, delay, > > baselines,...) > > - results are made available to the Sequencer > > > > What is likely to be the limiting factor, time to acquire the calibration > data, or time to reduce it? Presumably baseline calibrations won't change > during the execution of a Scheduling Block (unless the observing process is > supposed to compensate for earthquakes in real time). Delay calibrations > (according to the Use Cases, section 4.8.5 of the main requirements doc) are > performed "at least once per receiver tuning" or "at least once per observing > session" or (Lucas & Muders, private communication) "after reconnections of > cables/fibres and after antenna moves". So while it's true that ALMA can't > observe without these results, which certainly need to be known by the > observing process (Sequencer?), there might be more time to produce them than > the phrase "real-time" implies. The time to reduce the delay calibration is small compared to the time to acquire the data. But the feedback is real time, that is you have to apply them right away, particularly with the delay calibration (if you would proceed and apply the new delay offsets after some time, you would get a data set that is non-homogeneous). > As for pointing and focus, the Use Cases specify a "Pointing Session", which > I understand results in a pointing model, but also a "Pointing Calibration", > whose purpose is to update the parameters of that pointing model. The > Pointing Session is an array- (or observatory-) level calibration which is > done "after moving one or more antennas and/or at regular time intervals > (weekly ?)", while the Pointing Calibration gets done fairly often. When we > were generating the Use Cases, I had understood that there was no hard > requirement on how quickly the results from the "Pointing Calibration" were > needed: that an observing procedure could continue to execute even if the > updates to the pointing and focus parameters weren't available for some > time. How long this "some time" could be was never specified. It would be > helpful for the analysis if this could be made a little clearer. In the pointing calibration Use Case the pointing offsets are applied in a loop, the way Steve has written it (BC steps 2-5, remember that this had to be included in that specific ObservePointingCalibration sequence diagram). So at the end the offsets are already applied! Focus is the same though we never wrote the relevant Use Case. > From prior discussions and from the Use Cases, I had understood that phase > rms results would be made available to the observing process (not just to > the Scheduler), so that an executing Scheduling Block could adjust cycle and > dwell times on target and phase calibrator based on the results. Similarly, > an SB might want to terminate once a certain noise level had been > reached. Might not the time constraints be tighter than those on the > telescope calibrations? You're right. The time constraint is however not tighter since the results are used to modify loop parameters; therefore a delay of the order of one or a few loop cycles is tolerable. Regards Robert ------------------------------------------------------------------------------- From gueth@iram.fr Thu Jun 28 14:51:10 2001 Date: Thu, 28 Jun 2001 15:19:48 +0200 From: Frederic Gueth Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Definition pipelines Joseph Schwarz wrote: > > Frederic Gueth wrote: > > > Real-time calibration pipeline ... ... > the updates to the pointing and focus parameters weren't available for some > time. How long this "some time" could be was never specified. It would be > helpful for the analysis if this could be made a little clearer. > I was refering to the pointing calibration done very regularly -- each hour or so. Of course, alma can continue to observe without having the result of this pointing measurement, but then you have all chances to point at a slighlty wrong position. So a much wiser approch, used with all existing antennas or interferometers, is to wait for the results of the pointing calibration before continuing the observations. The same is true for focus measurements. The SSR document (req. 6.1-R1) mentions a max delay of 0.5 sec to have the calibration results passed to the observing system. > > - Astronomical calibration - reduce astronomical calibrations ... ... > From prior discussions and from the Use Cases, I had understood that phase > rms results would be made available to the observing process (not just to > the Scheduler), so that an executing Scheduling Block could adjust cycle and > dwell times on target and phase calibrator based on the results. Similarly, > an SB might want to terminate once a certain noise level had been > reached. Might not the time constraints be tighter than those on the > telescope calibrations? Yes, the results of the astronomical calibrations have to be made available to the observing process. But I think that the reduction time constraints for the telescope calibration are tighter. For the phase rms, it's a matter of deciding what has to be observed; if the pointing or focus are wrong, the data are affected by errors that you cannot correct (bad pointing, bad focus). Frederic. ------------------------------------------------------------------------------- From jschwarz@eso.org Thu Jun 28 14:51:26 2001 Date: Thu, 28 Jun 2001 15:47:52 +0200 From: Joseph Schwarz Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Definition pipelines > In the pointing calibration Use Case the pointing offsets are applied in > a loop, the way Steve has written it (BC steps 2-5, remember that this > had to be included in that specific ObservePointingCalibration sequence > diagram). So at the end the offsets are already applied! Focus is the > same though we never wrote the relevant Use Case. > BC step 3 says "While observations continue, the pipeline separately reduces the [pointing calibration] data sets..." How long can "observations continue" without getting the results? ------------------------------------------------------------------------------- From bglenden@cv3.cv.nrao.edu Wed Jul 4 08:11:03 2001 Date: Tue, 3 Jul 2001 15:01:54 -0600 From: Brian Glendenning Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements Some late comments. > A. Two fundamentally new aspects of ALMA are the integrated archive and > the pipleline, therefore the impact of requirements on these two areas > should be considered. In particular the Pipeline will be the most > critical aspect of ALMA given that we envision both an effective > dynamically scheduled observatory with prompt user feedback mechanism and > a scientifically viable archive. I think this statement is a bit over-done. I think a decent dynamic scheduler would be possible just by taking into account environmental factors. Similarly, I think even a raw data archive would be viable, although of course we want to be able to attract non-traditional observers. >>>SMyers: This seems to backtrack on the grand vision we had. I dont think we want to back off to just a raw data archive now! The scheduler is not part of this document, but I would like to have provisions in case it wants feedback from the data in the archive (or the pipeline).<<< > 1.0-R1 The Pipelines shall be able to process all data coming from the > array. It must not constitute a bottle-neck in the data flow, > meaning that several occurences of the same pipeline shall be able > to run in parallel if necessary. I would add something like: Some projects will require unusually high data rates or processing requirements. These will require processing outside of the ALMA system and will be flagged appropriately so they are not processed by the ALMA pipeline. >>>SMyers: good. added as PL-1.0-R1.1 <<< > 1.0-R2 All corrections applied shall be recorded so that any step can be > reversed and redone if needed. I agree with previous comments that this is harder to say than do. > 2.0-R1 The Calibration Pipeline shall be activated after each scan has > been observed. Mightn't you want to do this more often for some observations, e.g. for an OVRO style pointing scan where you want to do a calculation on each point of the triangle (if I remember correctly). Similarly, you might want to do something after each raster line during holography. Maybe "observation" rather than scan? >>>SMyers: It is likely that these sub-scan calculations will be handled by the online system as part of the procedures themselves (like on the VLA). It would be nice to say that the smallest entity that activates the pipleline is the scan (or whatever).<<< > 2.0-R2 The Calibration Pipeline may also be re-invoked at any time with > updated parameters or improved data. The results should not > immediately overwrite old results so comparison is possible > before adopting the new calibration. There will need to > be a method for validation and acceptance of calibration > updates. In general, do we want to keep old calibrations "forever" and merely "mark" the current set? >>>SMyers: probably, though this is an implementation issue (eg. flags for validity or deprecation).<<< > R2.1 apply the atmospheric calibration to the data Does this mean WVR? If so it is probably applied by the online system before the calibration pipeline. >>>SMyers: good question. Where do we see WVR being applied? Are there post-WVR atmospheric corrections?<<< > R3.1 compute the phase rms on the scan timescale scan->observation? >>>SMyers: we should iron out the nomenclature here.<<< > 2.2-R4 For the pointing and focus measuremets, the fitting results > should be automatically stored in the telescope > parameter file if the fitting error is less than > the user/ It would seem dangerous to allow a user specified threshold determine what was accepted for current values in the system of things like pointing and focus. (Do user's even want to know about these things?) >>>SMyers: replaced "user/observatory" with "system".<<< > 4.0-R1 The Science Pipeline shall be activated after completion of a > session. I don't think this is right. It activates after a breakpoint if the user has requested feedback, after all observations for a source have completed, or when the program completes. We don't want to have to needlessly repeat the nonlinear parts. >>>SMyers: have replaced session with "breakpoint", with a breakpoint assumed at the end of the session.<<< > 4.1-R3 The Science Pipeline shall check and correct the flux scale by > using observations of source of known fluxes. Any effect due to > the source being resolved shall be taken into account. It seems like the second part of this is really an offline requirement. >>>SMyers: Many of the best calibrators are resolved (planets, HII regions, even 3C48/3C286 on the VLA!) and system maintained models can be (are) used to deal with these cases.<<< > 4.1-R4 The Science Pipeline shall compute images for each frequency > channel, as well as for the continuum emission: Does the user have an option to not image, e.g. "edge" channels (to keep within data rate parameters, for example). >>>SMyers: add "(non-blanked, possibly user-specified)".<<< > 4.1-R5 The images shall be deconvolved using the most appropriate > algorithm. In case of a complex image, it should be possible to > have several algorithms running in parallel, the best > (according to criteria TBD) image being eventually selected. This will lead to an imhogeneous archive, and determining "best" by some automated procedure may not be easy. We have to decide if we're producing a "reference" image or trying to produce "the best" image. >>>SMyers: we had this (inconclusive) discussion at the last Berkeley meeting.<<< > ? maybe Total power from detectors If in fact it is not saved with the correlation data, do we normally throw it away considering it only a debugging tool? >>>SMyers: I routinely discard this from VLA data in filling. But we should make no assumptions here.<<< > Should these have some prefix to indicate that they are for Offline, like > "O-1.0" etc.? Yes (or embed the section number). >>>SMyers: I have adopted PL-xxx and OL-xxx to be easier...<<< > 1.1-R3 The performance of the package should be quantifiable and > commensurate with the data processing requirements of > ALMA output at a given time. This should be benchmarked > (e.g. "AIPSmarks") and reproduce accurately results for > a fiducial set of reduction tasks. We could be more explicit here, i.e. take a few fiducial problems and say that the performance should be greater than some value. I think it's also important to say that the package must be able to cope with data sizes much larger than main memory (however it chooses to do it). >>>SMyers: Should we craft these numbers at the meeting or leave TBD? Is there an official mechanism for deciding these TBD quantifications? These are likely to have sigificant impact on the packages (eg. Tim's costing) and I'd hate to have to make them up on the fly!<<< > 2.1-R3 Multitasking for all interfaces should be available where > appropriate. A bit vague, maybe: It must be possible to run one or more long-running calculations in the "background." While background tasks are running normal interactive activities must be possible. >>>SMyers: add OL-2.1-R3.1 <<< This brings up the subject of locking: The package must support locking data files so that there is no possibility of one process corrupting a file that is also being written to by another process. The model should be: "one writer, multiple readers." >>>SMyers: add as OL-3.1-R14 <<< > 2.1-R6 Multiple levels of "undo" should be supported for all tasks. Again, hard. Some operations can be undone readily, others can't (e.g., if you want to be able to undo a deconvolution you probably have to keep a copy of the original image!). >>>SMyers: see previous discussions and current (12-Jul-01) text.<<< > 2.3-R4 All functionality of the CLI must also be available in GUI > mode. Not realistic IMO (unless a CLI typein window counts!) >>>SMyers: most substantive operations should be scriptable (which is a CLI).<<< > 2.3-R5 A graphical data-flow oriented (IDL style) tool assembler > would be desirable, perhapsed as an advanced GUI for later > development. These are cute in principle, but they don't seem to be used much in practice. >>>SMyers: they are expensive in practice. Almost certainly so for us :-( <<< > 2.3-R3 The CLI should have command-line recall and editing Name completion? Minimum match? >>>SMyers: add<<< > 2.3-R4 All functionality of the GUI must also be available in CLI > mode. This direction I believe! > 2.4-R1 Must have basic programming facilities such as: IMO in a scientific command language whole-array arithmetic/processing is at least desirable. >>>SMyers: added as per Wim's comments.<<< > 2.5-R2 There should be a variety of help levels and documentation > [...] > These should be in printer-friendly formats. Does this mean no native HTML? >>>SMyers: add as desirable.<<< > 3.1-R8 Comprehensive and understandable processing history information > for the data must be maintained and be exportable What does exportable mean? Just that it's written into COMMENT cards in a FITS file or something more complicated? >>>SMyers: I would prefer the option of something more readable, but a FITS table would be adequate.<<< > 3.1-R10 When sorting or indexing is desirable for performance > enhancement, this should be carried out in a manner > transparent to the user. I personally prefer to manually "purge" rather than having semi-intelligent garbage collecting turn on at some random point (usually just when I want to do something else). >>>SMyers: I was thinking more along the lines of time-baseline vs. baseline-time indexing for speed-up of gridding etc.<<< > 3.3-R1 I/O of data must not be a bottleneck for processing, especially > for pipeline use. This is especially true if the native format > of the package is not used and filling/conversion is necessary. I think this is really a pipeline requirement. (Of course there are low FLOPS operations in the offline package where I/O will be the bottleneck!). Again, rather than subjective statements like this I think some objective tests/times would be better. >>>SMyers: Some text to that effect added 12-Jul-10.<<< > 3.6 Images and other Data Products Not having to transpose cubes is nice. >>>SMyers: how to word this as a req? Im not sure what you mean here.<<< > 3.7-R2 Imaging data in standard formats from astronomical instruments > at different wavelengths should be importable, with the > ability to combine these with ALMA data where appropriate. > This should be through a set of widely used formats. Be more explicit about what you mean by combine. I assume you mean that for each pixel output = f(input1, input2, ...) where f consists of the usual mathematical and logical functions, taking into account blanking. >>>SMyers: like AIPS COMB? add "(coadd)".<<< Blanking support should also be in the requirements: To prevent bad pixel values from propagating through calculations blanking must be supported. Usually any calculation that produces a pixel operating on a set of pixels at least one of which is blanked will result in a blanked output pixel. It is desirable that blanks not be destructive (the original pixel value is retained), and it be possible to turn on and off different blanking ("mask") levels. >>>SMyers: add as OL-3.6-R3 with implications also in OL-5.1-R7 and OL-6.5-R3.<<< > 4.1-R1 The package must be able reliably handle all of the proposed > and future ALMA calibration modes, including but not exclusive > to temperature controlled loads, semi-transparent vanes, > apex calibration systems, WVR data, noise injection, > fast-switching calibration transfer, planetary observations. Several or all of these are more requirements for the online system/calibration pipeline. >>>SMyers: but have importance here also.<<< > 4.1-R7 Data editing, calibration, and display of calibration Besides interactive editing, what about automatic editing? >>>SMyers: add as 4.1-R9 <<< > 4.2-R4 Redundancy (e.g. same or crossing baselines) should be used Do we have enough redundant baselines to make this relevant? >>>SMyers: there should be a number of u,v crossings at least.<<< > 4.4-R1 Individual data points must be associated with pointing > center information, and one must have the ability to > deal with complex scanning strategies. What does this mean? Just that it can be gridded, or something else? >>>SMyers: just that it can deal with wierd spiral patterns or other scanning patterns.<<< > 6.5-R2 The ability to collapse or integrate over sub-dimensions > of data cubes in order to form "moments" is required. Add: Interactive and automatic facilities (windowing, S/N based blanking, ...) to avoid degrading the S/N in the moment calculations must be provided. >>>SMyers: add as OL-6.5-R3.1 generalized beyond moments. <<< > 7.2-R3 Both contour plots with variously colored and styled lines > and false color maps should be possible, it should also be > possible to produce RGB overlays (i.e. one layer gets > assigned intensity scales of red, another one of green, > and one of blue). While useful, Hue/Intensity/Saturation is probably the more interesting color "space" to be able to do this in. >>>SMyers: add.<<< Cheers, Brian From bglenden@cv3.cv.nrao.edu Wed Jul 4 08:11:24 2001 Date: Tue, 3 Jul 2001 15:18:36 -0600 From: Brian Glendenning Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Draft v2.0 of Pipeline and Offline Requirements > > 1.0-R2 All corrections applied shall be recorded so that any step can > > be reversed and redone if needed. > I agree with previous comments that this is harder to say than do. Oops - I got a sign wrong. This is *easier* to say than do. Cheers, Brian ------------------------------------------------------------------------------- From bclark@aoc.nrao.edu Thu Jul 12 13:58:19 2001 Date: Mon, 9 Jul 2001 12:06:04 -0600 (MDT) From: Barry Clark Reply-To: alma-sw-ssr@cv3.cv.nrao.edu To: alma-sw-ssr@cv3.cv.nrao.edu Subject: Re: [alma-sw-ssr] Definition pipelines Frederick Gueth wrote: > > In the current draft, the calibration pipeline does not ignore the > observations of the astronomical source: it applies (in the sense: > store the appropriate quantity in the relevant header) the atmospheric > calibration to all incoming observations. > It is a matter of definition, but I would regard the processing of the unknown sources as part of the Calibration phase of the Science and/or Quick Look pipelines. (This is in aid of setting machine priorities - this has to be done on the Quick Look timescale, not on the same timescale as the other items mentioned.) > > products (depending on the type of observation and type of calibrator), and > > places the results in a location where they can be accessed both by the > > real-time system and by other reduction proceedures: 1) an antenna pointing > > offset for all antennas. 2). Tsys for all antennas as a function of time. > > 3.) Sideband ratios for all antennas. 4.) Antenna based flux calibration > > (TSYSJY) from a flux calibrator. 5.) Antenna based bandpass calibration. > > 6.) Antenna based polarization leakage terms (with the usual indeterminate > > offsets from a single observation). 7.) Antenna based IF phase differences > > (from a strongly polarized calibrator). 8.) Antenna based phase calibration > > (with noise and atmospheric rms). > > The list should be left open in such a general description. For > instance, > the focus offset or the antenna positions derived from a baseline > measurement > are missing. > I don't think these two functions are properly part of a pipeline. They will need a little human supervision, because of the very serious consequences if they are a bit wrong. But we need to specify that the Package will have tools to do these, which I do not think we have there.