ivoa-std / adql Goto Github PK
View Code? Open in Web Editor NEWAstronomical Data Query Language Standard
Home Page: https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL
License: Creative Commons Attribution Share Alike 4.0 International
Astronomical Data Query Language Standard
Home Page: https://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL
License: Creative Commons Attribution Share Alike 4.0 International
The BNF still mentions boolean_value_expression and all that's below it, including boolean_function, which isn't even defined. I thought we wanted to drop all that?
Following on from discussions at the October 2019 IVOA interop meeting, several people raised comments and issues about coordinate system transforms in ADQL.
This is a placholder issue that represents an ongoing discussion on the DAL mailing list. The discussion will continue as normal on the mailing list, and this GitHub issue will be closed when the discussion reaches a conclusion.
The current syntax (i.e. BNF grammar) of these Set operations in ADQL-2.1 is not correct (especially concerning precedence).
Besides, some questions are left unanswered about the way column sets must be merged:
And finally, the description of INTERSECT
is the same as the one of EXCEPT
.
Can we add in ADQL an INTERSECTION function which returns the polygon intersection of two regions?
Question asked on the 8th of June 2017 in the DAL mailing list: http://mail.ivoa.net/pipermail/dal/2017-June/007721.html
With different implementations of ADQL/TAP come different behaviours regarding the case folding of table/column identifiers.
For instance, let's take a column named ra
. In different implementations, it could be defined with a different case. Multiple reasons could lead to a such situation: different DBMS, different service implementers choice, different ADQL parser, ... Because of that, the theoretically same ADQL query may not work in all these implementations. In one ADQL service the following will work:
SELECT ra
FROM a_great_table
...but will not in the second unless the column identifier is written with a different case and between double quotes:
SELECT "Ra"
FROM a_great_table
Besides, some ADQL parsers will consider that a non-delimited column (or, generally speaking, an identifier) will match an existing column whatever case it is written in. Other parsers will not. Note that all have to preserve the case when the identifier is delimited (i.e. put between double quotes).
Because of such differences in the processing of case folding when specifying identifiers (i.e. columns, tables, schemas), some users can easily be confused. However, this situation is really hard to fix as most implementations depend on DBMS that already have different behaviours regarding case folding of identifiers.
In order to make ADQL users aware of this but also to make ADQL implementers more careful on their choice of identifiers, a note should be added in the ADQL standard document.
One suggestion raised by @msdemlei on the DAL mailing list (8th Dec. 2020 - "TAP, ADQL and field names inconsistencies between providers.") is:
a word of advice:
avoid mixed-case names in ADQL databases; they become trouble as soon
as people start gratuitously adding quotes to identifiers, which at
least a certain demography is likely to do
Following on from discussions at the October 2019 IVOA interop meeting, a number of people raised comments and issues about TIMESTAMP() and CAST().
This is a placholder issue that represents an ongoing discussion on the DAL mailing list. The discussion will continue as normal on the mailing list, and this GitHub issue will be closed when the discussion reaches a conclusion.
The ADQL text says that BOX, CIRCLE and POLYGON now have an optional coordinate system argument (the 1st one), in addition of 2 forms: one with pair(s) of coordinates and another with POINT value(s). However, the BNF has not yet been updated in that sense.
While starting to fix it, I also started to implement it in VOLLT/ADQL-Lib, but it seems to raise some ambiguities due to the fact that every argument can be a column reference.
For BOX and CIRCLE, the problem can be solved based on the number of arguments, ONLY IF the coordinate system argument is allowed only with coordinates pair(s) (instead of POINT value(s)). Which gives the following BNF:
<box> ::=
BOX <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <numeric_value_expression>
<comma> <numeric_value_expression>
<right_paren>
|
BOX <left_paren>
<coord_value>
<comma> <numeric_value_expression>
<comma> <numeric_value_expression>
<right_paren>
<circle> ::=
CIRCLE <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <radius>
<right_paren>
|
CIRCLE <left_paren>
<coord_value>
<comma> <radius>
<right_paren>
<coord_value> ::= <point> | <column_reference> | <user_defined_function>
Which allows the following ADQL statements with columns:
BOX(colCooSys, colRa, colDec, colWidth, colHeight)
BOX(colRa, colDec, colWidth, colHeight)
BOX(colCenterPoint, colWidth, colHeight)
CIRCLE(colCooSys, colRa, colDec, radius)
CIRCLE(colRa, colDec, radius)
CIRCLE(colCenterPoint, radius)
However, for POLYGON, because its number of arguments is variable, I can not find a way. For the moment I came up with the following BNF:
<polygon> ::=
POLYGON <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <coordinates>
{ <comma> <coordinates> } ?
<right_paren>
|
POLYGON <left_paren>
<coord_value>
<comma> <coord_value>
{ <comma> <coord_value> } ?
<right_paren>
Which makes all the following statements possible:
POLYGON(colCooSys, colRa1, colDec1, colRa2, colDec2, colRa3, colDec3)
POLYGON(colRa1, colDec1, colRa2, colDec2, colRa3, colDec3)
POLYGON(colPoint1, colPoint2, colPoint3)
-- but also:
POLYGON(colPoint1, colPoint2, colPoint3, colPoint4, colPoint5, colPoint6, colPoint7)
-- which is ambigus with the example with colCooSys !!!!
Am I wrong?
If not, does anybody have an idea on how to solve this issue?
The ModelInstanceInVot project are using keywords like FILTER
, GROUP_BY
, JOIN
, PRIMARYKEY
and FOREIGNKEY
to link data between tables.
This is a heads-up to the ADQL team to be aware of these and check that the way that ModelInstanceInVot are using these keywords is consistent with the existing definitions in ADQL and DAL.
In the following ADQL query:
WITH t1 ( ra, dec ) AS (SELECT col1, col2, col3, col4 FROM ...)
SELECT *
FROM t1
, t1
is a Common Table Expression (CTE) defined as the result of the query SELECT col1, col2, col3, col4 FROM ...
. In the current state of ADQL-2.1, an optional list of column labels can be provided (in the example above: ( ra, dec )
). Those aim to be the names of the columns as they must be referenced in the main query.
Considering that recursive CTEs are not supported in ADQL-2.1, there is no need to add an additional way to put labels on the output columns. This will confuse even more the users and would require an additional statement about the number of provided labels (e.g. what happens if different from the number of selected columns).
Morever, there is already a way to specify output labels: the keyword AS
in the SELECT
clause.
Using this method, here is an ADQL query equivalent to the above one:
WITH t1 AS (SELECT col1 AS ra, col2 AS dec, col3, col4 FROM ...)
SELECT *
FROM t1
This issue has been discussed during the IVOA Interoperability Meeting in Oct. 2019 in Groningen: see slide 13 of the talk "Feedback on the implementation of ADQL-2.1".
The second paragraph of Sec 1 "Introduction" says:
The ADQL specification makes no distinction between core and advanced or extended functionalities.
But this seems to be contradicted by other parts of the standard, e.g. the presence of section 4 "Optional components" and the comment in section 1.2 "Extended functionality":
This document defines the minimum set of functions, operators and datatypes that a service MUST implement in order to register as a service that implements this version of the ADQL specification.
Is the second paragraph of the introduction a relic of earlier versions that ought to be reworded or removed? Or have I misunderstood its intent?
At least some of the document cross-references in the HTML formatted version of the document are broken, e.g. sec 3.5.2 of PR-ADQL-2.1-20210528.html reads:
The CIRCLE datatype maps to the corresponding type defined in the .
In the PDF rendering it correctly says:
The CIRCLE datatype maps to the corresponding type defined in the DALI specification.
A .gitignore
file so that ignoring files generated by the execution of make pdf html
(especially the PDF and HTML files).
Running the Makefile also touches the ivoatex/
directory (apparently, just the metadata). I also propose to update .gitmodules
so that ignoring this "dirty" state.
The grammar of BOX, CIRCLE and POLYGON allows only pair(s) of coordinates, though the description of these functions have been modified since 2.1, now allowing a second form with any expression able to return a POINT (i.e. a column, CENTROID, UDF).
Current grammar:
<box> ::=
BOX <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <numeric_value_expression>
<comma> <numeric_value_expression>
<right_paren>
<circle> ::=
CIRCLE <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <radius>
<right_paren>
<polygon> ::=
POLYGON <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <coordinates>
{ <comma> <coordinates> } ?
<right_paren>
Suggested grammar update:
<box> ::=
BOX <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <numeric_value_expression>
<comma> <numeric_value_expression>
<right_paren>
|
BOX <left_paren>
[ <coord_sys> <comma> ]
<coord_value>
<comma> <numeric_value_expression>
<comma> <numeric_value_expression>
<right_paren>
<circle> ::=
CIRCLE <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <radius>
<right_paren>
|
CIRCLE <left_paren>
[ <coord_sys> <comma> ]
<coord_value>
<comma> <radius>
<right_paren>
<polygon> ::=
POLYGON <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<comma> <coordinates>
{ <comma> <coordinates> } ?
<right_paren>
|
POLYGON <left_paren>
[ <coord_sys> <comma> ]
<coord_value>
<comma> <coord_value>
{ <comma> <coord_value> } ?
<right_paren>
Thus, the grammar of POINT does not change:
<point> ::=
POINT <left_paren>
[ <coord_sys> <comma> ]
<coordinates>
<right_paren>
Jon Juaristi Campillo and Markus Demleitner have published an Endorsed Note (currently in PR) describing an IVOA consensus on some User Defined Functions:
Endorsed Note "Catalogue of ADQL User Defined Functions " - PR-1.0, Jon Juaristi Campillo, et.al.
There is no obligation to implement them. They are just recommandations for function names (all prefixed with ivo_
) and signatures.
Version 2.0 was approved back in 2008, so it's not surprising Appendix C (changes from previous version) has accumulated a lot of content, and that content does help to understand the evolution of the document.
However, for the purposes of review, it might be really helpful to have a new section at the top of Appendix C that captures the significant net differences between 2.0 and 2.1. Maybe that wouldn't shorten the list enough to matter, but I'm hoping that squashing the list into summaries (and removing interim versioning and history) would help make reviews easier.
Section 4.2.4 "Coordsys" provides a useful historical review of the coordsys argument in geometry constructor functions:
For historical reasons, the geometry constructors (BOX, CIRCLE, POINT and POLYGON) all accept an optional string literal as the first argument. This was originally intended to carry information on a reference system or other coordinate system metadata. As of this version of the specification this parameter has been marked as deprecated.
But this is slightly misleading, since it sounds (to me) like the coordsys parameter has always been optional. I suggest rewording to make it clear that it has become optional only in this version, e.g. change the last sentence to:
As of this version of the specification this parameter has been made optional and marked as deprecated.
The following sentence says
Services are permitted to ignore this parameter and clients are advised to pass an empty string here.
but in all the examples later in the document (e.g. sec 4.2.9, 4.2.11), the first argument is omitted, not passed as an empty string. Should this advice be changed to "clients are advised to omit this argument"?
Several people have expressed interest in supporting MOC as search criteria in ADQL.
Please add your use cases in comments for this issue and we will try to bring them together into a plan.
At some point in the past. we discussed defining an "overlaps" function. The semantics are simple but finding a good name... not so much.
One possibility we discussed was overloading INTERSECTS with two arguments, but I don't know what impact that would have on grammar complexity.
I do not have any current use case to justify a "contains" function, but it is equally easy to define.
The "Proposed Recommendation" link in Readme is to an old(2018) version of the document, which can lead to some confusion on review.
Should this be changed to load the May 2021 'final for real' version, the auto PDF preview DRAFT version, or removed? I have no strong opinions.
During ADASS and the IVOA Interoperability Meeting of Oct. 2019, it has been decided to not tackle bitwise operators and hexadecimal values in ADQL-2.1. The current description of these features introduce too many differences or issues in implementations (especially with MySQL).
For more details, see from slides 26 of the talk "Feedback on the implementation of ADQL-2.1".
These features are not yet mature. More thoughts are required. They may be introduced again in a next version of ADQL.
We’ve noticed that if you query a circle, it produces sources that occupy an actual circular area on the sky independent of how close one is to the poles – you get sources within a radius of the field center.
However, if you query for a BOX (i.e. a square), and specify a side-length, then if you are near the poles it gives the proper sweep in declination, but the width of the box is determined not by width in degrees on the sky, but width in right ascension, which will be much narrower near the poles than an actual span of degrees on the sky.
How would we get similar behaviour as the circle query but for a box query? For circular-aperture images we always used circle because the field actually is a circle, but now testing with square images we’ve noticed that if the image is near a pole, the BOX query does not actually return a square as the image sees, but a square in terms of RA & Dec, which is then just a narrow strip through the middle of what the telescope actually records if near a pole.
We would expect that a BOX search behaves the same as a CIRCLE search, in terms of area of actual sky requested, not area in terms of RA & Dec.
We have been directed to the ADQL source by both astroquery and gaiaquery groups.
We can provide more detailed info if required. Please see the attached figures. The circle query is requested with a given radius, and the box query is requested with the same side length equivalent to the circle, i.e. a side-length two-times the circle radius.
Should section 2.2.1 "Subqueries" mention WITH
clauses (CTEs) alongside WHERE
and FROM
clauses as another place where subqueries are permitted?
IN_UNIT
is a function able to convert a value into another unit. Its current signature is:
IN_UNIT(value, target_unit)
This works as expected if the 1st argument is a reference to a column whose the unit is specified in TAP_SCHEMA.columns
. But what happens ...
Nothing is specified in the current state of the document. Hence this issue.
During the IVOA Interop. Meeting (taking place in Groningen in Oct. 2019), there was a consensus for the following solution:
If no unit metadata is available,
IN_UNIT
should return the given value unchanged with the unit piece of metadata set to the given target unit. In other words, in such case: no conversion, just annotation.
The legitimate query:
SELECT mycol FROM mytab WHERE mycol!='toto'
returns the error:
Invalid ADQL regular identifier: : "mycol!"! If it aims to be a column/table name/alias, you should write it between double quotes.
Markus Demleitner raised the point that the 2-arguments version of DISTANCE()
allowed 2 general geometries instead of 2 points. Computing the distance between 2 points is easy but between 2 geometries requires an clear explanation of how such distance is computed.
To paraphrase Markus:
Being general here is a pain in the neck (actually, that's why I ran
into this question). For one, you'll need to define distance
much more carefully for such geometries, and if (as I think we ought
to) we chose "minimum of distances of between all points in arg 1 and
arg 2", I doubt we'll see many correct implementations of that. Also
I'll want to map a lot of DISTANCE calls into contains(point,
circle) statements (because that's much easier on the query planner),
and that's a pain if one of the points could actually be, say, a
polygon.
The issue here is about limiting the 2 arguments of DISTANCE()
to be POINTs.
Complete email thread on http://mail.ivoa.net/pipermail/dal/2020-February/008268.html
In the current version of this document, REGION
has been directly removed. Doing so introduces a backward incompatibility between minor revisions of the ADQL language. Moreover, some existing TAP services still rely on this function.
Discussions went on this topic during ADASS and the IVOA Interop. Meeting in Oct. 2019 in Groningen. It has been decided to put back REGION
in ADQL.
The only parameter of REGION
is typed as a VARCHAR. In ADQL-2.0, it was the STC-S representation of a spatial region. It has been decided in ADQL-2.1, to extend the possible representations:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.