Hello, 

We are sorry for the late reply to Mark's detailed points on the DCF
proposal.  Hopefully this is better late than never.

Frank, Lindsey, and Doug


> Date: Thu, 09 Aug 2001 18:21:05 +1000
> From: Mark Calabretta <Mark.Calabretta@atnf.csiro.au>
> 
> Notes on NOAO DCF proposal documents
> ------------------------------------
> 
>  1) DCF extension of CTYPEj is a good generalization of the currently proposed
>     distortion corrections.
> 
> 
>  2) Notation
>     --------
> 
>     The DCF proposal documents should use our carefully defined index
>     notation!  E.g. PVj_ms, not PVl_ns.
> 
>     I will use our notation.

Obviously the notation can be adjusted to make the paper(s) consistent.  A
point here, and why a different notation was used, is the question about
the ranges of the indices.  In the early paper I draft the the index j was
limited to 0-99 and m was limited to 0-999.  If Paper I is changed to allow
using the maximum number of digits as appropriate then there would be no
need for a distinction.  It seems from some of the comments below that this
is already acceptable.

>  3) Location of the distortion correction in the algorithm chain
>     ------------------------------------------------------------
> 
>     Should the distortion correction come before or after the CD matrix?
> 
>     There are good arguments either way:
> 
>     A thread on fitswcs 2000/Jan: "Detector distortion correction
>     representations in FITS" argues that it should come before, since it's
>     related to the physical media.  E.g. distortions in CCD arrays are
>     expected to be pretty-well static for each chip, so correct for
>     distortions first, then rotation, skewness and scale next.
> 
>     On the other hand, higher-order distortion on a Schmidt plate
>     (Pat Wallace's BENDXY) are fixed with respect to the focal plane and if
>     the plate is rotated or translated it makes sense to apply these
>     corrections via the CD matrix before the distortion function.
> 
>     Perhaps there should be a provision for DCFs which come before or after
>     the CD matrix!  E.g. PLP before and PLX after.  Similarly SPP/SPX for
>     splines and TBP/TBX for table-based corrections.
> 
>     A DCF before the CD matrix would require offset parameters (analogous to
>     CRPIXi) as well as scaling.
> 
>     I will assume that DCFs have offsets as well as scaling parameters.

The crux of this question revolves about how raster detectors rotate
relative to the sky.  A desirable property of a WCS is that a single
parameter or the CD matrix parameters can be used to account for this
rotation leaving the rest of the WCS, including the higher order distortion
description, unchanged.  From the data acquisition point of view one would
like to be able to characterize the WCS, apart from the coordinate
reference value, with the detection system in one orientation.  Then, when
the instrumentation allows rotations, take the telescope pointing and
instrument rotation and make minimal changes to the standard WCS to record
the observation.  This decomposition would help enable initial WCS to be
added to ground-based data acquisition much as is done with the NOAO
Mosaic.

The complication is in how the instrument rotates and where the distortion
terms lie.  In space often the whole spacecraft rotates and so the rotation
of the detector is equivalent to rotation of the sky.  In this case the
distortions would come before a final transformation to the idealized WCS
through the CD matrix.  However, in ground-based optical equitorial
telescopes rotations generally occur between the telescope and the camera.
If the majority of the distortions are in the telescope (corrector) and
fixed with respect to the sky then one needs to rotate the pixel matrix
first and then apply a fixed distortion correction.

Alt-Az telescopes (now the norm for optical telescopes) and Naismyth
mounted instruments requiring continuous rotation during the exposure is
something we haven't really thought about.

Of course for the most general case where some distortions are fixed with
the sky and some are fixed with the detector/camera, one rotation term and
one distortion function is not enough.  One could therefore argue that
there be a chain of distortion transformations.

What this boils down to is a trade-off between the complexity of the WCS
formalism and the ability to allow much of the distortion to be independent
of the instrument orientation for a specific exposure.  Our opinion is that
we should stick with one quite general distortion transformation.  However,
it could be quite useful to allow the CD matrix to be applied either before
or after the distortion transformation.  This could be done as suggested
above.

An alternative suggestion, which would require a change to paper I, is to
allow two CD-type transformations.  In other words, the steps would be, 1)
apply a linear transformation, 2) apply a distortion transformation, 3)
apply a linear transformation, 4) apply an idealized transformation to the
final world coordinates.  This is not something we advocate but we raise
the idea in case someone sees merit and can make a strong case.

The counterpoint is that the distortion transformation can include terms
corresponding to the rotation.  The connection to the above is that a
rotation of the instrument might require all the coefficients to be
modified, in a possibly non-trivial way, from a fidicuial position angle
solution.


>  4) Form of CTYPEj keyword
>     ----------------------
> 
>     The form of the DCF keyword must be spelt out explicitly.  Extend CTYPEj
>     from "4-3" form to "4-3-3".  Algorithm code and DCF code to be right-
>     padded with blanks or "-" as appropriate, e.g.
> 
>        "RA---UV     "   or   "RA---UV--XY "
>         4444-333-333          4444-333-333
> 
>     Distortions applied to simple linear axes would have "8-3" form, e.g.
>     "WAVELEN--PLY".

Yes this is fine.  Note that this does not provide for the instance feature
given in the original DCF proposal.  However, if the assignment of axes for
the independent variables, what we will call an axes map, is made by
separate keywords (as suggested by Mark), then the need for instance
discrimination disappears.  We support the idea of such axes map keywords
and so don't argue for the instance suggestion.

>  5) Image transposition
>     -------------------
> 
>     DCF formalism should allow for transposition of axes.  Currently, if I
>     have
> 
>        CTYPE1 =  'RA---AZP'
>        CTYPE2 =  'DEC--AZP'
>        PV2_1  =     202.64
> 
>     then I can transpose the data array provided that I swap keywords:
> 
>        CTYPE1 =  'DEC--AZP'
>        CTYPE2 =  'RA---AZP'
>        PV1_1  =     202.64
> 
>     This is not the case with the DVj_ms as defined because there is an
>     implicit ordering on the axis number.
> 

This should be a consideration in defining the parameter indexing.  We would
look forward to another representation that would be easier to transpose;
though, this concern is not a show-stopper.  The axes map keywords would
provide such an improvement.

>  6) Subimaging
>     ----------
> 
>     Suppose we have spatial and spectral distortion described by PLY for all
>     axes in a cube:
> 
>        CTYPE1 = 'RA---TAN-PLY'
>        CTYPE2 = 'DEC--TAN-PLY'
>        CTYPE3 = 'WAVELEN--PLY'
> 
>     Suppose, however, that the spatial distortion was independent of
>     wavelength (as in Example 4.4 of the DCF document) and I wanted to extract
>     a subimage consisting of a single image plane and put it in a separate
>     FITS file.
> 
>     In the present formalism this would be a messy job because the numbering
>     of the DVj_ms for the spatial axes is based on the existence of that
>     third, spectral, axis.

We don't see why this is a problem.  The ability to provide WCS of different
dimensionality than the pixel array would simply mean that the WCS would
be unchanged except for adjusting CRPIX values.  Taking out a slice
not along a plane is equivalent to the image transposition question where
the removed dimension is transposed to the highest eliminated axis.

>  7) Axis coupling
>     -------------
> 
>     In the general case, distortion coupling between axes is not as simple as
>     the DCF document suggests.
> 
>     For example, consider a data cube with two spatial axes and one spectral
>     axis and suppose that distortion in the spectral axis was a function of
>     position (e.g. a Fabry-Perot interferometer).  Suppose you want to use a
>     PLY to correct for wavelength distortion.  This forces you to attach PLY
>     to the spatial axes as well:
> 
>        CTYPE1 = 'RA---TAN-PLY'
>        CTYPE2 = 'DEC--TAN-PLY'
>        CTYPE3 = 'WAVELEN--PLY'
> 
>     This is so even if there is no spatial distortion, in which case the
>     polynomial for the spatial axes would have zero coefficients.  This is
>     inefficient.
> 
>     However, it is quite possible that there could be a spatial distortion
>     which you might want to describe with, say, a spline function.  In that
>     case you would also have to attach SPL to the wavelength axis:
> 
>        CTYPE1 = 'RA---TAN-SPL'
>        CTYPE2 = 'DEC--TAN-SPL'
>        CTYPE3 = 'WAVELEN--SPL'
> 
>     Then you would have to define a 3D spline fit, an ugly job.
> 
>     So you're left with a compromise, you can't use PLY for spectral together
>     with SPL for spatial.
> 
>     This problem arises because of the dual function of the DCF codes: firstly
>     to define an algorithm, and secondly to associate axes.
> 
> 
>  8) Explicit axis coupling
>     ----------------------
> 
>     A simple solution to the problems raised by transposition, subimaging and
>     unwanted axis coupling would be to associate axes explicitly.
> 
>     For example, if you wanted
> 
>        CTYPE1 = 'RA---TAN-SPL'
>        CTYPE2 = 'DEC--TAN-SPL'
>        CTYPE3 = 'WAVELEN--PLY'
> 
>     but with wavelength dependent on the spatial axes, then you could simply
>     say so via the DVj_ms parameters.
> 
>     For each axis, DVj_0s would say how many other axes the intermediate world
>     coordinate for axis j was dependent on (i.e. N); the next N parameters
>     would say which axes and in which order; then N offsets; then N
>     normalization values; then N coefficients for computing r; then N+1 values
>     of the highest power, then the required number of coefficients.
> 
>     All DCFs would assign the same meaning to the first 1+N+N+N values of
>     DVj_ms.
> 
>     This eliminates the need for DCF qualifiers.
> 
>     The transposition problem is solved by transposing the values of DVj_1s to
>     DVj_Ns appropriately (for each j).
> 
>     The subimaging problem above is solved trivially; since the DVj_ms for the
>     spatial axes would only refer to each other, the header cards relating to
>     the spectral axis could be eliminated (if so desired).

Whether many, even all, coefficients are zero, is "efficient" is a
philosophical opinion provided the obvious definition that missing
coefficients are zero.  Also how the indexing is done and whether there are
large gaps in the indices is also a matter of taste.  The only serious
consideration is whether the limits on the indices due to the 8 character
keywords cause constraints on realistic situations.

The dual use of the distortion identifier to indicate the function type and
which axes contribute the independent variables was suggested purposely to
have all the WCS identifications appear in just the CTYPE keywords.  This
allows seeing at a glance which axes are used in which functions from just
the CTYPE keyowrds.  This is the same philosopy that says RA and TAN are
both in CTYPE rather than separate keywords such as CTYPE1=RA, CPROJ1=TAN.

However, as noted earlier, we support the alternative of providing an axes
map via keywords.  In this case it would be possible to have the
combination of SPL and PLY on different axes and PLY could be used on
two axes which are not themselves coupled.

>  9) The PVj_ms, PSj_ms, DVj_ms and DSj_ms keywords
>     ----------------------------------------------
> 
>     DVj_ms provides for 10,000 parameters (0 - 9999) for single digit axis
>     number with no multiple WCS letter.
> 
>     However, note that Paper I currently specifies that s is blank for the
>     primary axis description.  This would restrict the DVj_ms to 0 - 999.

We support the added flexibility of allowing the primary WCS with no letter
to extend to 9999.

>     Double digit axis number is unlikely, ok, but multiple WCS code is not so
>     unlikely.  Restricts DVj_ms to 1,000 values.  This may be insufficient as
>     shown below.
> 
>     For BINTABLEs containing image arrays or for pixel lists, 10,000 DVj_ms
>     values can be obtained but only for single digit axis numbers and single
>     digit column numbers with no multiple WCS code (Paper I, Appendix C).
> 
>     Is it proposed to extend the number of PVj_ms keywords (currently 0-99) to
>     match the DVj_ms (0-9999)?
>
>     Does the number of PSj_ms match that of the DSj_ms (10,000)?

It would make sense to maintain the same form for both but that is TBD.

>     What is the character length of DSj_ms and PSj_ms?

We would expect that it would be the full card if needed.

>     Software will have to be prepared to parse PSj_ms whether it's used or
>     not.
> 
>     Each axis may have up to 10,000 possible PSj_ms and DSj_ms values and
>     software systems without dynamic memory allocation will have to reserve
>     sufficient space for them ahead of time.  If the PSj_ms and DSj_ms have a
>     maximum character length of, say, 25 then 500kb of memory needs to be
>     reserved - for each axis!
> 
>     To this must be added 80kb for the DVj_ms (plus 80kb for PVj_ms?), per
>     axis (assuming the parameters will be maintained as double precision
>     variables).
> 
>     Thus software without dynamic memory allocation which wants to process up
>     to four axes will have to reserve 2.64Mb in array space.  Multiply this by
>     N for the number of WCS it expects to have to process simultaneously.  For
>     IFS images composed of spectra from many slitlets, N could be in the
>     hundreds.
> 
>     Clearly the number of PSj_ms and DSj_ms should be restricted!
> 
>     No example of the need for PSj_ms has been given.  At least one persuasive
>     example of PSj_ms usage should be provided. 


PS/DS keyword usage would most certainly occur with the TBD equivalent to
the PRC description where you would need to point to an extension.  In
general it would occur whenever a pointer to another file is needed
including a PRC image or a table containing WCS information.  Another
possiblity is for attaching descriptive information to a WCS transformation
in an encapsulated fashion; i.e. just a COMMENT card would not do.

History information can be encapulated in the PSj_ms parameters. An example
for celestial coordinate WCSs' might be the name of the catalog from which a
plate solution was derived. The idea is that if a particular WCS is
overwritten these history cards would be overwritten as well unlike
COMMENT cards which are general.

It is far-fetched to believe there will be cases with 10000 PVs, DVs, PSs,
and DSs.  It has always been the case that there are no limits to the number
of cards in a FITS header but that has not meant we would want part of
the standard to say the number of cards is limited to some value just so
readers don't have to potentially deal with large amounts of information.
There should be no limit other than the maximum index possible.


> 10) The NAXISi=1 question
>     ---------------------
> 
>     NAXISi=1 is used in both examples in Wells, Greisen & Harten (1981), the
>     first example FITS headers published!  Moreover, the first example has
>     a degenerate axis sandwiched between two non-degenerate ones.  NAXISi=1 is
>     also shown in Greisen & Harten (1981).  
> 
>     Usage of NAXISi=1 is deeply entrenched in radioastronomy, more so than
>     that of the CDj_i matrix in optical astronomy for which we've already made
>     detrimental changes to the spec.
> 
>     It would be desirable for extensive past usage of NAXISi=1 in
>     radioastronomy (and probably elsewhere) to be supported in future by all
>     areas of astronomy.
> 
>     NAXISi=1 is only used for transport purposes and can be translated to
>     whatever internal representation is required.
> 
>     NAXIS comes first in header.  FITS readers know up-front how much memory
>     to allocate for WCS parameters.
> 
>     New proposal assumes degenerate axes come last.  Subimaging may violate
>     this.
> 
>     Only a question of the aesthetics of the syntax.  Don't invent new
>     conventions which add nothing to the existing ones.
> 
>     Simple solution: continue to use NAXISi=1.  FITS interpreters read NAXIS
>     value and can reduce it by one for each NAXISi=1 encountered to deduce
>     "NDIM" and "WCSDIM".
> 
>     Paper I needs to describe the use of NAXISi=1.

This has been discussed separately and a consensus proposal was accepted
at ADASS.

> 11) Errors arising
>     --------------
> 
>     Our intention was that a FITS interpreter could choose to ignore the PRC
>     if it chose to accept the small error arising.
> 
>     The DATAMAX, DATAMIN cards recorded the maximum and minimum value of the
>     distortion in the image, and MAXCORi recorded the maximum for each axis. 
> 
>     This should be propagated to the DCF formalism (but maybe DATAMAX and
>     DATAMIN are not the best names).

We assume you mean MAXCORj to be consistent.  The DCF concept was proposed
in the same vein that an interpreter could ignore the transformation.  The
proposal to include information about the errors is still reasonable in
this context.  We don't like DATAMAX/DATAMIN as this has some other
intuitive interpretations.  Instead we would suggest MAXCORj with either
j=0 or no index to refer to the whole WCS.

> 12) Encapsulation
>     -------------
> 
>     Encapsulation refers to the separability of the FITS header parser and the
>     WCS interpreter; the parser should not need to know any more than the
>     generalities, i.e. CDj_i, CRPIXi, CTYPEj, CRVALj, PVj_ms, DVj_ms, DSj_ms.
> 
>     This certainly is a desirable design goal.  It means that global
>     parameters need to be eliminated as far as possible.  Presently this means
>     LONPOLEs, LATPOLEs, RESTFRQs, RESTWAVs, and FPRADIUS.  Arguably it also
>     includes RADESYSs and EQUINOXs which qualify the first half of the CTYPEj
>     card rather than the second.
> 
>     In order for encapsulation to be possible we need a convention on default
>     values.  The proposed convention, that the default always be zero, is
>     difficult to make workable in general and will lead to artificial
>     constructs.  For example, the default for LONPOLEs depends on other
>     parameters, sometimes it is zero and other times it is 180 deg.  Similarly
>     for LATPOLEs.
> 
>     One solution would be that on reading the NAXIS keyword the FITS parser
>     should allocate an array to hold the PVj_ms and initialize each element
>     with a non-signalling NaN value (any "magic" value will do).  As the
>     parser reads each PVj_ms card the corresponding array element is
>     initialized; when the parser has finished reading the header, defaulting
>     PVj_ms will leave NaNs in the array.  Later, when the WCS interpreter
>     software first encounters one of these NaN values it would substitute the
>     WCS-specific default value (e.g. this would be done by the wcsset
>     routines).  This means that non-zero defaults can be defined and that FITS
>     header parsers don't need to know any of the details of the WCS.
> 
>     Should LONPOLEs and LATPOLEs be considered an exception?  Originally they
>     were made into separate header cards because they complement the CRVALj in
>     defining the three Euler angles for a spherical coordinate transformation.
>     However, I think they should probably be included in the PVj_ms, so that
>     for all celestial projection types, PVj_0s would contain phi_p (i.e.
>     LONPOLEs) and PVj'_0s would contain delta_p (i.e. LATPOLEs), where j and
>     j' indicate the longitude and latitude axis respectively.
> 
>     With this machinery, I would go further and record (phi_0,theta_0), the
>     reference point of the projection, in PVj_1s and PVj'_1s.  This would have
>     some very useful applications as discussed elsewhere.


We strongly favor the idea of ecapsulating the celestial coordinate projection
parameters although not doing it would not be a show stopper.  We do agree
that they should be able to take non-zero default values.  The present
default value scheme seems fine.  Maybe the PVj_ms parameters should be
permitted non-zero defaults, whereas the DVj_ms should have 0 valued defaults.
Is there any reason not to make this distinction?

We don't really think that quantities RADESYSs and EQUINOX and equinox need
to be PVj_ms parameters as these are are not used to compute coordinate
values, although they are required to assign meaning, and to do celestial
coordinate transformations ... Again we think the current defaults are
reasonable.

> 
>     RESTFRQs and RESTWAVs are so basic that you'd want human readers to be
>     able to see them clearly in the header (in fact, RESTFREQ is in the SDFITS
>     spec).  Should they be consigned exclusively to the PVj_ms, or perhaps be
>     duplicated therein?
> 
>     We have a quandry, it is evident that we can't satisfy all of the
>     following simultaneously:
> 
>        1) encapsulation,
>        2) no duplication,
>        3) human readability.

The RESTFRQs and RESTWAVs are already in common use ... However FPRADIUS
should be PVj_ms parameter.

> 
> 13) Inversion of the distortion functions
>     -------------------------------------
> 
>     There is no mention of the problem of inverting the distortion functions. 
>     Needs to be discussed.

This has generally not been discussed because the philosophy was that the WCS
is defined as a description of the coordinates and not a run-time or
operational feature.  The implication is then that inversion is up to
the user.  We see nothing wrong with this and the discussion would simply
say this.  Are you suggesting that a more specific description of inversion
is needed?


> 14) Specification of distortion functions
>     -------------------------------------
> 
>     Exact details of PLY, PRC, and spline must be spelt out in Paper IV.  (PLY
>     is done, ok.)

PLY may be done but others have to have a chance to comment on any
rearrangement of cofficient indexing, etc.  Maybe you mean that the general
structure of a polynomial is agreed upon.


> 15) Regarding the formulation of PLY in the DCF paper
>     -------------------------------------------------
> 
>     The equations are hard to follow in their current form and really need to
>     be rewritten in proper mathematical form (i.e. in LaTeX).
> 
>     Normalization constants are defined repeatedly for each x_j for each axis.
>     Axis j=1 has normalization values for the x_j of all axes, not just x_1,
>     Axis j=2 has normalization values for the x_j of all axes, not just x_2,
>     etc.  I suspect this was unnecessary, it would have been sufficient for
>     axis j=1 just to contain the normalization value for x_1.
> 
>     However, with explicit axis coupling it certainly will be necessary.
> 
>     Likewise, in Eq. 2 when computing r.

If the normalizations are the same for x_j in each of the axes distortion
functions then it is true the values would be repeated.  However, it gives
the added generality that each axis function may need different
normalization.  So for axis 1 the intermediate pixel coordinate x_1 might
not need to be normalized (no normalization coefficient) but for the
function of axis 2 the intermediate pixel coordinate x_1 might be
normalized by some large factor.  Similarly for the terms in eq. 2.

>     Introduction of D = N + 1 only serves to confuse matters.  Eliminate it.

This is a matter of clarity to allow reference to the fact that the
dimensionality of the function is one more than the number of axes because
of use of the added cartisian distance variable.  You can eliminate it but
then any mention of the number of terms related to the number of variables
would always have to be noted as N+1.  We think it clarifies matters
significantly rather than confuse them as you say.  We don't object to
writing things without it but if (N+1) crops up everywhere then clearly
defining a variable for it would make it easier to read equations.

>     The number of coefficients of the power terms is
> 
>          5*N + 2 + (DVj_(k+1)s + 1) * (DVj_(k+2)s + 1) * (DVj_(k+3)s + 1) * ...
> 
>     where k = 5N + 1.  E.g. to encode the TAN+poly of Paper II with powers up
>     to 7 would take
> 
>          5*2 + 2 + (8 * 8 * 8)
> 
>     i.e. 524 per axis, times 2 axes which is 1048.  TAN+poly has 2 * 40 = 80.

Where does the 5 come from?  In our original description the factor is 3.
In that description the maximum number of coefficients (i.e. the maximum
keyword index) as

           1 + 3*N + prod(Pi+1,N+1)

where the Pi are the maximum powers for each of the N+1 independent variables.
In your example where N=2 and P=7 you have

           1 + 3*2 + (8 * 8 * 8)

The first term is not really significant and basically the number of
terms grows as (P+1)^(N+1).  So in your example the 8*8*8=512 is your
basic point.


>     With 10,000 coefficients and 2 axes, the maximum power possible is 21 in
>     each variable.  For 3 axes the maximum power drops to 9 which is starting
>     to become uncomfortable.  For 4 axes it is 6.  (In fact, this is a
>     compelling argument for explicit axis coupling.)
> 
>     With 1,000 coefficients and 2 axes, the maximum power possible is 9 in
>     each variable, but for 3 axes it is 5 which is probably unworkable.  For
>     4 axes the maximum power is only 3.

What you say is true.  The point for everyone to remember, however, is
that we are talking about the keyword indices and, in particular, what
the maximum index may be.   Because by defining zero valued parameters
to be absent this is the MOST compact polynomial description in terms
of the number of keywords.

>     The formula for computing i,j,k,... from n should be given.  However, it
>     should be possible to evaluate Eq. 3 without it.
> 
>     Eq.3 should be expressed in the form of a Horner polynomial.  These are
>     faster to compute and reduce rounding error, i.e.
> 
>        [(((... + a3)*x + a2)*x + a1)*x] * y^j * r^k
> 
>     in which case it would make sense to specify the coefficients of the 
>     higher power terms in x first.
> 
>     There is allowance for only one r variable in the polynomial, e.g. in a
>     spectral cube where you need a 2D r term in the spatial plane you can't
>     also have a 3D r term.  (Explicit axis coupling probably minimizes the
>     impact of this.)

> 16) Computational efficiency
>     ------------------------
> 
>     The 80 term TAN+poly formula of paper II requires 1024 parameters in the
>     DCF -TAN-PLY formulation but most of these will be zero.
> 
>     What is the relative computational efficiency?  This is most likely to be
>     apparent in inverting the distortion function.
> 
>     Using the program appended, I determined empirically that the floating
>     point unit on a Sun workstation adds and multiplies by a random number as
>     quickly as it adds 0.0 or multiplies by 0.0 or 1.0.  (Bizarrely, it even
>     seemed a bit quicker at adding non-zero numbers!)
> 
>     This means that the zero coefficients of the polynomial slow down the
>     calculation in proportion to their number.  Thus the 1024 coefficient form
>     of TAN+poly will be slower than the 80 coefficient form by about a factor
>     of x13.
> 
>     Do we need a special purpose, stripped-down version of -PLY to replicate
>     TAN+poly with reasonable efficiency?

This is just a question of intelligent programming.  Clearly one can
write code to recognize and optimize various conditions.  For instance,
one can convert the input coefficient index encoding from the form where
every coefficient is used including zero coefficients (the simple version
you coded) to a form where the powers used are identified and indexed.
This latter form is what you get at with your proposal (17) for a
different indexing scheme.  My point is that the way the coefficients
are indexed and encoded as FITS parameters need not be driven by the
most efficient way to compute.

> 17) A better PLY
>     ------------
> 
>     Given that the proposed PLY representation suffers from the drawbacks
>     described, a better representation would be to encode the powers of the
>     (x,y,...) in the DVj_ms themselves.

The only real drawback is the maximum index values and, as noted, it is more
efficient in number of keywords than the proposal below.

>     As before, for each axis, DVj_0s would say how many other axes the
>     intermediate world coordinate for axis j was dependent on (i.e. N); the
>     next N parameters would say which axes and in which order; then N offsets;
>     then N normalization values; then N coefficients for computing r.
> 
>     Following those basic DVj_ms, each term in the polynomial would be encoded
>     as an (N+2)tuple, the first being the coefficient and the next N+1 the
>     power for each (x,y,z,...r).
> 
>     The number of parameters required to encode an M-term polynomial would be
> 
>        1 + 4*N + M * (N+2)
> 
>     For 2 axes and 1,000 coefficients you could have 247 terms.  For 3 axes it
>     would be 198 terms, for 4 axes 165 terms, for 5 axes 141 terms which ought
>     to be enough.
> 
>     TAN+poly would take no more than 2 * 169 = 338 parameters; in practical
>     cases it would probably take much less - terms with zero coefficients need
>     not be encoded, or computed.
> 
>     Another nice thing about encoding the powers of PLY in the DVj_ms is that
>     there is no limit on the highest power, and you could have fractional and
>     negative powers.  Only the number of terms is limited.
> 
>     Clearly this idea could be extended; with a (3N+2)tuple, the first
>     parameter would be the polynomial coefficient, the next N the power of
>     each (x,y,..., z) as they are multipled together, the next 2N the
>     coefficients and powers for each (x,y,..., z) as they are added together,
>     and then the power of that sum, e.g. for N=2 a term would be
>     A * x^B * y^C * (D*x^E + F*y^H)^I.  Since this would allow the elimination
>     of the r variable, the number of parameters required to encode an M-term
>     polynomial would be
> 
>        1 + 3*N + M * (3N+2)
> 
>     which, for TAN+ply, is 2 * 327 = 654 (maximum).
> 
>     The repetitious computation of r for TAN+poly in this extended scheme
>     suggests an even more compact and efficient but still very general
>     encoding.  The first 3N+1 coefficients would be as before, the next
>     parameter would define a number, K, of "auxiliary" variables (i.e. r) and
>     the next K (2N+1)tuples would define them, i.e. a 2D Euclidean r would be
>     (1,2,1,2,0.5).  Of course, K=0 would be allowed.  Then the subsequent
>     (N+K+1)tuples would define the polynomial terms.  This scheme uses
> 
>        1 + 3*N + 1 + K * (2N+1) + M*(N+K+1)
> 
>     coefficients.  TAN+poly (N=2, K=1, M=40) would take 2 * 173 = 346 which
>     is only slightly more than the first form, but with greatly increased
>     flexibility and without compromising computational efficiency (given that
>     the auxiliary variables would be cached).

Yes this formulation has advantages that we like.  To be clear about the
offsets and normalization for regularizing the intermediate coordinates,
we propose to change the original equation (1) in the DCF proposal to

    (1) n_m = (1 + scale) * x_l + offset

Note that this follows the common FITS linear transformation of scale/offset
rather than using a divisive normalization.

While the above description handles well the maximum keyword indexing
problem when there are many zero valued coefficients, it has N*M more
keywords (for specifying the powers) than the other polynomial formulation
requires.  Maybe there should be two polynomial DCFs?

In your example with P=7 and N=2, if all the coefficients are non-zero
then the number of coefficients and maximum keyword index is

    1 + 3*2 + 1 + 1 + 5 + 512*(2+1+1) ~ 2048

while the original description would have ~512 keywords with corresponding
maximum index.

As an alternative, we suggest specifying the powers (not necessarily
integers) and coefficient with string parameters.  Example 2 below
illustrates this syntax.  This has the advantage of both polynomial formats; a
minimal number of coefficient keywords and a compact indexing scheme.

EXAMPLE OF ALTERNATE INDEXING SCHEME AND USE OF AXIS MAP KEYWORDS

The examples below are based on the Fabry-Perot data cube discussed in
the original DCF proposal
(see http://iraf.noao.edu/projects/fitswcs/dcf.html#4.4).  Figure 1 shows
the original proposal keywords for comparison with figure 2 that uses
axis map keywords, a scale/offset normalization of the independent
variables, and coefficients which include the powers so that the
indexing may be sequential.

Figure 1: Original formulation for a Fabry-Perot data cube.

NAXIS   =                     3 / Number of image raster axes      
NAXIS1  =                   512 / Number of pixels                 
NAXIS2  =                   512 / Number of pixels                 
NAXIS3  =                   512 / Number of pixels                 

WCSAXES =                     3 / WCS dimensionality
CTYPE1  = 'RA---TAN-PLY'        / RA---TAN with distortion         
CTYPE2  = 'DEC--TAN-PLY'        / DEC--TAN with distortion         
CTYPE3  = 'WAVE-WAV-PLY'        / WAVE-WAV with interference filter
CRVAL1  =       201.94541667302 / RA reference (deg)               
CRVAL2  =              47.45444 / DEC reference (deg)              
CRVAL3  =                 5560. / Wavelength (Angstrom)            
CUNIT3  = 'Angstrom'            / Units                            
CD1_1   =         -2.1277777E-4 / RA axis scale (deg/pixel)        
CD2_2   =          2.1277777E-4 / DEC axis scale (deg/pixel)       
CD3_3   =                   -1. / Wavelength scale (Angstrom)      
CRPIX1  =                257.75 / Reference pixel                  
CRPIX2  =                258.93 / Reference pixel                  
CRPIX3  =                  256. / Reference pixel                  

DV1_0   =                     1 / Up to 1st order in x             
DV1_3   =                     1 / Up to 1st order in r             
DV1_7   =                     1 / Include x in r                   
DV1_8   =                     1 / Include y in r                   
DV1_13  =                   -3. / xr coefficient                   

DV2_1   =                     1 / Up to 1st order in y             
DV2_3   =                     1 / Up to 1st order in r             
DV2_7   =                     1 / Include x in r                   
DV2_8   =                     1 / Include y in r                   
DV2_13  =                   -3. / yr coefficient                   

DV3_2   =                     1 / Up to 1st order in z             
DV3_3   =                     4 / Up to 4th order in r             
DV3_6   =                 5559. / Normalization (CRVAL-1)          
DV3_7   =                  1.38 / (1/C)                            
DV3_8   =                  1.38 / (1/C)                            
DV3_14  =                -2780. / r^2 coefficient (-0.5*CRVAL)     
DV3_15  =                -2780. / zr^2 coefficient (-0.5*CRVAL)    
DV1_18  =                 2085. / r^4 coefficient (3/8*CRVAL)      
DV1_19  =                 2085. / zr^4 coefficient (3/8*CRVAL)     


Figure 2: Revised formulation for Fabry-Perot data cube with string coefficients


NAXIS   =                     3 / Number of image raster axes      
NAXIS1  =                   512 / Number of pixels                 
NAXIS2  =                   512 / Number of pixels                 
NAXIS3  =                   512 / Number of pixels                 

WCSAXES =                     3 / WCS dimensionality
CTYPE1  = 'RA---TAN-PLY'        / RA---TAN with distortion         
CTYPE2  = 'DEC--TAN-PLY'        / DEC--TAN with distortion         
CTYPE3  = 'WAVE-WAV-PLY'        / WAVE-WAV with interference filter
CRVAL1  =       201.94541667302 / RA reference (deg)               
CRVAL2  =              47.45444 / DEC reference (deg)              
CRVAL3  =                 5560. / Wavelength (Angstrom)            
CUNIT3  = 'Angstrom'            / Units                            
CD1_1   =         -2.1277777E-4 / RA axis scale (deg/pixel)        
CD2_2   =          2.1277777E-4 / DEC axis scale (deg/pixel)       
CD3_3   =                   -1. / Wavelength scale (Angstrom)      
CRPIX1  =                257.75 / Reference pixel                  
CRPIX2  =                258.93 / Reference pixel                  
CRPIX3  =                  256. / Reference pixel                  

DV1_0   =                     2 / Number of independent axes variables
DV1_1   =                     1 / First independent variable
DV1_2   =                     2 / Second independent variable
DV1_3   =                    0. / Offset for first variable             ***
DV1_4   =                    0. / Offset for second variable            ***
DV1_5   =                    0. / Scale for first variable              ***
DV1_6   =                    0. / Scale for second variable             ***
DV1_7   =                    1. / Scale for first variable in r
DV1_8   =                    1. / Scale for second variable in r
DS1_1   = '1 0 1 -3.'           / Coefficient (xr)

DV2_0   =                     2 / Number of independent axes variables
DV2_1   =                     1 / First independent variable
DV2_2   =                     2 / Second independent variable
DV2_3   =                    0. / Offset for first variable             ***
DV2_4   =                    0. / Offset for second variable            ***
DV2_5   =                    0. / Scale for first variable              ***
DV2_6   =                    0. / Scale for second variable             ***
DV2_7   =                    1. / Scale for first variable in r
DV2_8   =                    1. / Scale for second variable in r
DS2_1   = '0 1 1 -3.'           / Coefficient (yr)

DV3_0   =                     3 / Number of independent axes variables
DV3_1   =                     1 / First independent variable
DV3_2   =                     2 / Second independent variable
DV3_3   =                     3 / Third independent variable
DV3_4   =                    0. / Offset for first variable             ***
DV3_5   =                    0. / Offset for second variable            ***
DV3_6   =                    0. / Offset for third variable             ***
DV3_7   =                    0. / Scale for first variable              ***
DV3_8   =                    0. / Scale for second variable             ***
DV3_9   =         -0.9998333333 / Scale for third variable (1/CRVAL-1)
DV3_10  =                 0.725 / Scale for first variable in r (C)
DV3_11  =                 0.725 / Scale for second variable in r (C)
DV3_12  =                    0. / Scale for third variable in r         ***
DS3_1   = '0 0 0 2 -2780.'      / r^2 coefficient (-0.5*CRVAL)     
DS3_2   = '0 0 1 2 -2780.'      / zr^2 coefficient (-0.5*CRVAL)    
DS3_3   = '0 0 0 4 2085.'       / r^4 coefficient (3/8*CRVAL)      
DS3_4   = '0 0 1 4 2085.'       / zr^4 coefficient (3/8*CRVAL)     


*** These are the default value so the keyword may be eliminated.

Note that instead of original equation (1) we use the revised transformation
for normalizing the coordinates:

    (1)  n_m = (1 + scale) * x_j + offset

Therefore the scale terms are roughly the reciprocals of the original example.