Giter Site home page Giter Site logo

ivoa-std / adql Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 7.0 690 KB

Astronomical Data Query Language Standard

Home Page: https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL

License: Creative Commons Attribution Share Alike 4.0 International

TeX 99.40% Makefile 0.60%

adql's People

Contributors

gmantele avatar jd-au avatar mbtaylor avatar molinaro-m avatar msdemlei avatar olebole avatar zarquan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adql's Issues

boolean_value_expression still in BNF

The BNF still mentions boolean_value_expression and all that's below it, including boolean_function, which isn't even defined. I thought we wanted to drop all that?

Resolve coordinate system issues

Following on from discussions at the October 2019 IVOA interop meeting, several people raised comments and issues about coordinate system transforms in ADQL.

This is a placholder issue that represents an ongoing discussion on the DAL mailing list. The discussion will continue as normal on the mailing list, and this GitHub issue will be closed when the discussion reaches a conclusion.

Fix `UNION`, `INTERSECT` and `EXCEPT`

  1. The current syntax (i.e. BNF grammar) of these Set operations in ADQL-2.1 is not correct (especially concerning precedence).

  2. Besides, some questions are left unanswered about the way column sets must be merged:

    • same number of columns (obviously, yes)
    • same column names? If no, which one to take? (generally DBMS take the name provided in the 1st query)
    • same datatypes? (or at least datatypes compatible ; e.g. numerics)
    • (same or "compatible" UCDs?)
  3. And finally, the description of INTERSECT is the same as the one of EXCEPT.

Add a note about case folding of identifiers

With different implementations of ADQL/TAP come different behaviours regarding the case folding of table/column identifiers.

For instance, let's take a column named ra. In different implementations, it could be defined with a different case. Multiple reasons could lead to a such situation: different DBMS, different service implementers choice, different ADQL parser, ... Because of that, the theoretically same ADQL query may not work in all these implementations. In one ADQL service the following will work:

SELECT ra
FROM a_great_table

...but will not in the second unless the column identifier is written with a different case and between double quotes:

SELECT "Ra"
FROM a_great_table

Besides, some ADQL parsers will consider that a non-delimited column (or, generally speaking, an identifier) will match an existing column whatever case it is written in. Other parsers will not. Note that all have to preserve the case when the identifier is delimited (i.e. put between double quotes).

Because of such differences in the processing of case folding when specifying identifiers (i.e. columns, tables, schemas), some users can easily be confused. However, this situation is really hard to fix as most implementations depend on DBMS that already have different behaviours regarding case folding of identifiers.

In order to make ADQL users aware of this but also to make ADQL implementers more careful on their choice of identifiers, a note should be added in the ADQL standard document.

One suggestion raised by @msdemlei on the DAL mailing list (8th Dec. 2020 - "TAP, ADQL and field names inconsistencies between providers.") is:

a word of advice:
avoid mixed-case names in ADQL databases; they become trouble as soon
as people start gratuitously adding quotes to identifiers, which at
least a certain demography is likely to do

Resolve TIMESTAMP() vs CAST()

Following on from discussions at the October 2019 IVOA interop meeting, a number of people raised comments and issues about TIMESTAMP() and CAST().

This is a placholder issue that represents an ongoing discussion on the DAL mailing list. The discussion will continue as normal on the mailing list, and this GitHub issue will be closed when the discussion reaches a conclusion.

Fix BNF of BOX, CIRCLE and POLYGON

The ADQL text says that BOX, CIRCLE and POLYGON now have an optional coordinate system argument (the 1st one), in addition of 2 forms: one with pair(s) of coordinates and another with POINT value(s). However, the BNF has not yet been updated in that sense.

While starting to fix it, I also started to implement it in VOLLT/ADQL-Lib, but it seems to raise some ambiguities due to the fact that every argument can be a column reference.

For BOX and CIRCLE, the problem can be solved based on the number of arguments, ONLY IF the coordinate system argument is allowed only with coordinates pair(s) (instead of POINT value(s)). Which gives the following BNF:

<box> ::=
        BOX <left_paren>
            [ <coord_sys> <comma> ]
            <coordinates>
            <comma> <numeric_value_expression>
            <comma> <numeric_value_expression>
        <right_paren>
      |
        BOX <left_paren>
            <coord_value>
            <comma> <numeric_value_expression>
            <comma> <numeric_value_expression>
        <right_paren>

<circle> ::= 
        CIRCLE <left_paren>
            [ <coord_sys> <comma> ]
            <coordinates>
            <comma> <radius>
        <right_paren>
      |
        CIRCLE <left_paren>
            <coord_value>
            <comma> <radius>
        <right_paren>

<coord_value> ::= <point> | <column_reference> | <user_defined_function>

Which allows the following ADQL statements with columns:

BOX(colCooSys, colRa, colDec, colWidth, colHeight)
BOX(colRa, colDec, colWidth, colHeight)
BOX(colCenterPoint, colWidth, colHeight)

CIRCLE(colCooSys, colRa, colDec, radius)
CIRCLE(colRa, colDec, radius)
CIRCLE(colCenterPoint, radius)

However, for POLYGON, because its number of arguments is variable, I can not find a way. For the moment I came up with the following BNF:

<polygon> ::=
        POLYGON <left_paren>
            [ <coord_sys> <comma> ]
            <coordinates>
            <comma> <coordinates>
            { <comma> <coordinates> } ?
        <right_paren>
      |
        POLYGON <left_paren>
            <coord_value>
            <comma> <coord_value>
            { <comma> <coord_value> } ?
        <right_paren>

Which makes all the following statements possible:

POLYGON(colCooSys, colRa1, colDec1, colRa2, colDec2, colRa3, colDec3)
POLYGON(colRa1, colDec1, colRa2, colDec2, colRa3, colDec3)
POLYGON(colPoint1, colPoint2, colPoint3)

-- but also:
POLYGON(colPoint1, colPoint2, colPoint3, colPoint4, colPoint5, colPoint6, colPoint7)
-- which is ambigus with the example with colCooSys !!!!

Am I wrong?
If not, does anybody have an idea on how to solve this issue?

Check ModelInstanceInVot keywords are consistent with ADQL

The ModelInstanceInVot project are using keywords like FILTER, GROUP_BY, JOIN, PRIMARYKEY and FOREIGNKEY to link data between tables.

This is a heads-up to the ADQL team to be aware of these and check that the way that ModelInstanceInVot are using these keywords is consistent with the existing definitions in ADQL and DAL.

See ModelInstanceInVot issues 3, 6, 11 and 12

Remove the CTE's column labels

In the following ADQL query:

WITH t1 ( ra, dec ) AS (SELECT col1, col2, col3, col4 FROM ...)
SELECT *
FROM t1

, t1 is a Common Table Expression (CTE) defined as the result of the query SELECT col1, col2, col3, col4 FROM .... In the current state of ADQL-2.1, an optional list of column labels can be provided (in the example above: ( ra, dec )). Those aim to be the names of the columns as they must be referenced in the main query.

Considering that recursive CTEs are not supported in ADQL-2.1, there is no need to add an additional way to put labels on the output columns. This will confuse even more the users and would require an additional statement about the number of provided labels (e.g. what happens if different from the number of selected columns).

Morever, there is already a way to specify output labels: the keyword AS in the SELECT clause.
Using this method, here is an ADQL query equivalent to the above one:

WITH t1 AS (SELECT col1 AS ra, col2 AS dec, col3, col4 FROM ...)
SELECT *
FROM t1

This issue has been discussed during the IVOA Interoperability Meeting in Oct. 2019 in Groningen: see slide 13 of the talk "Feedback on the implementation of ADQL-2.1".

Clarify core/extended funcationality comment

The second paragraph of Sec 1 "Introduction" says:

The ADQL specification makes no distinction between core and advanced or extended functionalities.

But this seems to be contradicted by other parts of the standard, e.g. the presence of section 4 "Optional components" and the comment in section 1.2 "Extended functionality":

This document defines the minimum set of functions, operators and datatypes that a service MUST implement in order to register as a service that implements this version of the ADQL specification.

Is the second paragraph of the introduction a relic of earlier versions that ought to be reworded or removed? Or have I misunderstood its intent?

Add .gitignore

A .gitignore file so that ignoring files generated by the execution of make pdf html (especially the PDF and HTML files).

Running the Makefile also touches the ivoatex/ directory (apparently, just the metadata). I also propose to update .gitmodules so that ignoring this "dirty" state.

Fix BNF of BOX, CIRCLE and POLYGON

The grammar of BOX, CIRCLE and POLYGON allows only pair(s) of coordinates, though the description of these functions have been modified since 2.1, now allowing a second form with any expression able to return a POINT (i.e. a column, CENTROID, UDF).

Current grammar:

<box> ::=
	BOX <left_paren>
		[ <coord_sys> <comma> ]
		<coordinates>
		<comma> <numeric_value_expression>
		<comma> <numeric_value_expression>
	<right_paren>

<circle> ::=
	CIRCLE <left_paren>
		[ <coord_sys> <comma> ]
		<coordinates>
		<comma> <radius>
	<right_paren>

<polygon> ::=
	POLYGON <left_paren>
		[ <coord_sys> <comma> ]
		<coordinates>
		<comma> <coordinates>
		{ <comma> <coordinates> } ?
	<right_paren>

Suggested grammar update:

<box> ::=
	BOX <left_paren>
		[ <coord_sys> <comma> ]
		<coordinates>
		<comma> <numeric_value_expression>
		<comma> <numeric_value_expression>
	<right_paren>
	|
	BOX <left_paren>
		[ <coord_sys> <comma> ]
		<coord_value>
		<comma> <numeric_value_expression>
		<comma> <numeric_value_expression>
	<right_paren>

<circle> ::=
	CIRCLE <left_paren>
		[ <coord_sys> <comma> ]
		<coordinates>
		<comma> <radius>
	<right_paren>
	|
	CIRCLE <left_paren>
		[ <coord_sys> <comma> ]
		<coord_value>
		<comma> <radius>
	<right_paren>

<polygon> ::=
	POLYGON <left_paren>
		[ <coord_sys> <comma> ]
		<coordinates>
		<comma> <coordinates>
		{ <comma> <coordinates> } ?
	<right_paren>
	|
	POLYGON <left_paren>
		[ <coord_sys> <comma> ]
		<coord_value>
		<comma> <coord_value>
		{ <comma> <coord_value> } ?
	<right_paren>

Thus, the grammar of POINT does not change:

<point> ::=
	POINT <left_paren>
		[ <coord_sys> <comma> ]
		<coordinates>
	<right_paren>

More concise change log possible from 2.0 to 2.1?

Version 2.0 was approved back in 2008, so it's not surprising Appendix C (changes from previous version) has accumulated a lot of content, and that content does help to understand the evolution of the document.

However, for the purposes of review, it might be really helpful to have a new section at the top of Appendix C that captures the significant net differences between 2.0 and 2.1. Maybe that wouldn't shorten the list enough to matter, but I'm hoping that squashing the list into summaries (and removing interim versioning and history) would help make reviews easier.

Clarify discussion of coordsys argument

Section 4.2.4 "Coordsys" provides a useful historical review of the coordsys argument in geometry constructor functions:

For historical reasons, the geometry constructors (BOX, CIRCLE, POINT and POLYGON) all accept an optional string literal as the first argument. This was originally intended to carry information on a reference system or other coordinate system metadata. As of this version of the specification this parameter has been marked as deprecated.

But this is slightly misleading, since it sounds (to me) like the coordsys parameter has always been optional. I suggest rewording to make it clear that it has become optional only in this version, e.g. change the last sentence to:

As of this version of the specification this parameter has been made optional and marked as deprecated.

The following sentence says

Services are permitted to ignore this parameter and clients are advised to pass an empty string here.

but in all the examples later in the document (e.g. sec 4.2.9, 4.2.11), the first argument is omitted, not passed as an empty string. Should this advice be changed to "clients are advised to omit this argument"?

Add support for MOC

Several people have expressed interest in supporting MOC as search criteria in ADQL.

Please add your use cases in comments for this issue and we will try to bring them together into a plan.

resolve TBD on how to use interval type in ADQL

At some point in the past. we discussed defining an "overlaps" function. The semantics are simple but finding a good name... not so much.

One possibility we discussed was overloading INTERSECTS with two arguments, but I don't know what impact that would have on grammar complexity.

I do not have any current use case to justify a "contains" function, but it is equally easy to define.

README Proposed Rec link out of date

The "Proposed Recommendation" link in Readme is to an old(2018) version of the document, which can lead to some confusion on review.

Should this be changed to load the May 2021 'final for real' version, the auto PDF preview DRAFT version, or removed? I have no strong opinions.

Remove bitwise operators and support of hexadecimal values

During ADASS and the IVOA Interoperability Meeting of Oct. 2019, it has been decided to not tackle bitwise operators and hexadecimal values in ADQL-2.1. The current description of these features introduce too many differences or issues in implementations (especially with MySQL).

For more details, see from slides 26 of the talk "Feedback on the implementation of ADQL-2.1".

These features are not yet mature. More thoughts are required. They may be introduced again in a next version of ADQL.

Box search has strange behaviour - astroquery/gaiaquery

We’ve noticed that if you query a circle, it produces sources that occupy an actual circular area on the sky independent of how close one is to the poles – you get sources within a radius of the field center.

However, if you query for a BOX (i.e. a square), and specify a side-length, then if you are near the poles it gives the proper sweep in declination, but the width of the box is determined not by width in degrees on the sky, but width in right ascension, which will be much narrower near the poles than an actual span of degrees on the sky.

How would we get similar behaviour as the circle query but for a box query? For circular-aperture images we always used circle because the field actually is a circle, but now testing with square images we’ve noticed that if the image is near a pole, the BOX query does not actually return a square as the image sees, but a square in terms of RA & Dec, which is then just a narrow strip through the middle of what the telescope actually records if near a pole.

We would expect that a BOX search behaves the same as a CIRCLE search, in terms of area of actual sky requested, not area in terms of RA & Dec.

We have been directed to the ADQL source by both astroquery and gaiaquery groups.

We can provide more detailed info if required. Please see the attached figures. The circle query is requested with a given radius, and the box query is requested with the same side length equivalent to the circle, i.e. a side-length two-times the circle radius.

box query
circle query

CTEs are subqueries

Should section 2.2.1 "Subqueries" mention WITH clauses (CTEs) alongside WHERE and FROM clauses as another place where subqueries are permitted?

Fix description of `IN_UNIT`

IN_UNIT is a function able to convert a value into another unit. Its current signature is:

IN_UNIT(value, target_unit)

This works as expected if the 1st argument is a reference to a column whose the unit is specified in TAP_SCHEMA.columns. But what happens ...

  • ... if no unit is set for this column reference?
  • ... if the value is not a column reference (e.g. a numeric litteral, a math expression)?_

Nothing is specified in the current state of the document. Hence this issue.

During the IVOA Interop. Meeting (taking place in Groningen in Oct. 2019), there was a consensus for the following solution:

If no unit metadata is available, IN_UNIT should return the given value unchanged with the unit piece of metadata set to the given target unit. In other words, in such case: no conversion, just annotation.

Fix `DISTANCE` description

Markus Demleitner raised the point that the 2-arguments version of DISTANCE() allowed 2 general geometries instead of 2 points. Computing the distance between 2 points is easy but between 2 geometries requires an clear explanation of how such distance is computed.

To paraphrase Markus:

Being general here is a pain in the neck (actually, that's why I ran
into this question). For one, you'll need to define distance
much more carefully for such geometries, and if (as I think we ought
to) we chose "minimum of distances of between all points in arg 1 and
arg 2", I doubt we'll see many correct implementations of that. Also
I'll want to map a lot of DISTANCE calls into contains(point,
circle) statements (because that's much easier on the query planner),
and that's a pain if one of the points could actually be, say, a
polygon.

The issue here is about limiting the 2 arguments of DISTANCE() to be POINTs.

Complete email thread on http://mail.ivoa.net/pipermail/dal/2020-February/008268.html

Put back `REGION`

In the current version of this document, REGION has been directly removed. Doing so introduces a backward incompatibility between minor revisions of the ADQL language. Moreover, some existing TAP services still rely on this function.

Discussions went on this topic during ADASS and the IVOA Interop. Meeting in Oct. 2019 in Groningen. It has been decided to put back REGION in ADQL.

The only parameter of REGION is typed as a VARCHAR. In ADQL-2.0, it was the STC-S representation of a spatial region. It has been decided in ADQL-2.1, to extend the possible representations:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.