timj / aandc-fits Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 10.0 2.12 MB

Paper on the FITS format for Astronomy and Computing journal

TeX 100.00%

aandc-fits's People

Contributors

Stargazers

Watchers

Forkers

migueldvb aalexov olebole mdboom astrofrog astronomeralex stargaser mwcraig embray brianthomas

aandc-fits's Issues

2.4.4. More detail/rewording needed for compression problem

Hi, section 2.4.4. is interesting, but I am worried that we dont have enough support here for why compression in FITS is unsuitable. The FITS community largely feel that this is one of the details that FITS gets 'right'. From my reading of this section, what I think is being advocated here is not that tiled (rice) compression is bad, but rather that its only implemented in a convention, and other parsers/readers wont understand it, correct?

If thats the case, then what the underlying problem is that our readers lack an API which is 'extensible' via a shared plugin mechanism (to point out one possible solution). Do I have that right? Assigning to you Slava since I think you created this section (if not, and you have no opinions, please assign back to me). -brian

Reorder "alternative encodings" and "constrained metadata representation"

The relative placement (and implied relative importance) of Sec 2.1.3, "No support for alternative encodings", and Sec 2.3.1, "Constrained metadata representation", looks inverted to me. At the end of 2.1.3 it is stated that expressing FITS keywords more broadly is less critical than 7-bit ASCII. Most people are much, much more bothered by the 8-character limit on keyword names than by lack of Unicode.

In terms of relative importance, I propose 2.1.3 be subsumed into 2.3.1.

At minimum, the end of 2.1.3 should say that the important issue of keyword naming is addressed later.

Send round timeline for submission

Time line for submission will be sent round by @brianthomas

References broken after text edit

commit 75aed13 broke the references when compiling the paper using pdflatex.

Tweak provenance section

@GBruceB has volunteered to have a look at the provenance section.

Add Herschel Observation Context example to Section 2.1

Section 2.1 seems mistitled at "Poor support for information interchange". Isn't what is discussed the hierarchies or relationships between data elements?

Herschel provides a nice example of implementing hierarchies. They are done as FITS files. For Herschel, the Observation Context is the container for Calibration, Pointing, Raw Data, Processed Data, etc. and would make a nice diagram to illustrate the point.

(I brought this up for the ADASS paper but had trouble finding a reference. However if Wikipedia is used as a reference for the definition of 'data model', can't we use the online Herschel documentation as a similar reference?)

Could distortion metadata make a nice example for "Data Model" section?

When reading Sec 2.1.2, 5th paragraph about having multiple data models, it occurred to me that a nice practical example is distortion metadata in images from the Palomar Transient Factory (PTF). That project decided to incorporate both "SIP" and "TPV" keywords in their headers, to make sure that both SExtractor and other tools could handle their data. This leads to having to make rules of what to do when both data models are present.

For a reference, see http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1363103
also available at at http://web.ipac.caltech.edu/staff/shupe/reprints/SIP_to_PV_SPIE2012.pdf

Minor nitpick on "Astropy" spelling

Section 3.1 in the latest draft references "AstroPy", but we spell it "Astropy" (with a lower-case "p").

Minor issues in the manuscript

This is my short list of issues (other than those in #39) with the text:

For uniformity, where it says United Kingdom it should say UK (sorry, I introduced that one)
2.1.1: Consider to have model1:TEMP and model2:TEMP rendered as model1:TEMP and model2:TEMP (i.e., using \texttt{})
2.2.5: The statement This is evident from the significant fraction of unit strings in current databases (referring to the fact that the FITS model does not accomodate the full range of astronomical data) should use some citation or example as backup. Perhaps from some CDS papers?
3.1, last paragraph, first sentence: Consider a comma between format and versioning, to avoid ambiguity.
3.2, penultimate paragraph, parenthesis in the last sentence: consider schemata or schemas instead of schema.
3.3, last paragraph, last sentence: consider Schemas or Schemata instead of Schema
4.: where is says has stifled needed change of FITS, consider has stifled necessary change of feets.

And that's all on my part.

Use of issues for making "document" discussions

I was wondering about using the GitHub issues facility to make questions on things in the article that might need discussion, before any kind of actual commit to the paper. If you (specially @timj as the "host") agree, I think it's easier than just having the conversation on email, as it makes for more "on topic" conversation. These issues would work similarly to how we worked with the Google Doc notes.

Missing references in sections 2.4 and 2.4.3

Hi Slava,

I think you added this content (correct me if thats wrong). Could you please add in the references to the bib file? They appear to have gone missing.

Move issue on alternative encodings

I think that section 2.1.3 (No support for alternative encodings) could be move to section 2.3 on inflexibility in information representation because it is more related with the issues discussed there whereas section 2.1 deals mostly with practical aspects of data transfer.

Mention CLASS and Miriad data formats

CLASS and Miriad uvdata and image formats that are used in radio astronomy could be introduced in one of the paragraphs in the introduction.

A few comments on the paper

There are some comments I gave on the draft of the paper @timj sent out to the astropy list:

Section 2.1, third paragraph: The first sentence, "It is not as if we have not needed these things until now", reads awkwardly. Perhaps: "We are not the first to point out these shortcoming in FITS."?
Section 2.1.4 -- Another general comment is that additional flexibility built into any new data format could prevent informal variants by making new additions still fit in the format.
Section 2.2.2 -- I don't think Figure 1 adds much to the paper. It shows an example, but doesn't seem to add anything substantial. If you want to keep it, perhaps add more in the caption about what it adds to the argument put forth in the paper.
Section 2.3.1, Figure 2 -- Include more in the figure caption. What is the purpose of this figure? How does is add to the argument?

We need more detail for 2.4.2 (parallel read/write in LOFAR)

In this short section, mention is made that LOFAR has been driven in part to use HDF5 solution instead of FITS because of the need for parallel read/write. Can a few sentences be written to describe the solution? Im guessing that as an ex-LOFAR person Anastasia, that you can provide this detail? If you are unable to do so, just assign back to me and I'll try to grok it from your paper. -brian

Add brief explanation of HDX

Two small issues

2.1.2 last paragraph: IMO the statement that the community has accepted many informal variants of existing models deserves an example or two.
3.2: Is the indented paragraph a citation? The only other place where indentation is used is in 2.1.2 for a citation from wikipedia. If the paragraph in 3.2. is not a citation, I would rather keep it in the running text.

Otherwise I am very happy with the text. Thanks for the good work!

Add a section on missing data values

As discussed in #12, it would be good to add a section discussing FITS's poor support for missing values (distinct from NaN). Not only does TNULL not work for float columns, for integer columns it requires sacrificing a value to be the 'NULL' value. Ideally a format should fully support masking of values.

This week is a little crazy for me, so I won't have time to contribute text to it, but if no one else has time and the paper is still being prepared after the new year, then I can work on something then.

cc @embray @timj

Problem with footnote for "once fits, forever fits"

After the latest edits, the footnote for once fits, forever fits is split between pages, and this causes a \pdfendlink ended up in different nesting level than \pdfstartlink error.

This error and solution are characterised here 1, so this should be solved after other content is added.

The workaround, as suggested in 1, is to add draft to the \usepackage{hyperref} part, but the problem must be solved before submission.

Explain in Sec 2.1 the benefits of hierarchies

Section 2.1 3rd paragraph lists several examples of hierarchical formats (I think). I find I'm familiar with none of them so this part leaves me befuddled. Would it be possible to expand upon the benefits of e.g. the Starlink HDS format? Why do arbitrary hierarchies lead to chaos? Can an example be provided?