datopian / data-api Goto Github PK

View Code? Open in Web Editor NEW

9.0 7.0 3.0 724 KB

Next generation Data API for data management systems including CKAN.

Home Page: https://tech.datopian.com/data-api/

License: MIT License

JavaScript 84.02% Shell 2.17% Dockerfile 1.94% HTML 6.56% Python 5.30%

data ckan data-api

data-api's Introduction

Data API

For data management systems including CKAN.

Features

GraphQL endpoint
Bulk export of data to json/csv/xlsx files
datastore_search endpoint (similar to CKAN Datastore extention)

Version

The current version is v1

APP_VERSION = 'v1'

Usage

Endpoints

/{APP_VERSION}/graphql
/{APP_VERSION}/download
/{APP_VERSION}/datastore_search
/{APP_VERSION}/datastore_search/help

GraphQL Endpoint

GraphQL Endpoint exposes the hasura GraphQL API.

For the GraphQL documentation please refer to Hasura Documentation

datastore_search endpoint

Parameters

resource_id (string) – MANDATORY id or alias of the resource to be searched against
q (string or dictionary) – JSON format query restrictions {“key1”: “a”, “key2”: “b”}, it’ll search on each specific field (optional)
distinct_on [bool || list of field names] – If True: return only distinct rows, if list of fields will return only
limit (int) – Maximum number of rows to return (optional, default: 100)
offset (int) – offset this number of rows (optional)
fields (list of strings) – fields to return (optional, default: all fields)
~~sort (string) – comma separated field names with ordering e.g.: “fieldname1, fieldname2 desc”~~ Not implemented yet
~~filters (dictionary) – matching conditions to select, e.g {“key1”: “a”, “key2”: “b”} (optional)~~ Not implemented - similar to q

Results:

The result is a JSON document containing:

schema (JSON) – The data schema
data (JSON) – matching results in JSON format
~~fields (list of dictionaries) – fields/columns and their extra metadata~~ Not Implemented
~~offset (int) – query offset value~~ Not implemented
~~total (int) – number of total matching records~~ Not implemented

Examples

Help page

Basic query with limit

With a test table having the following schema:

We can make different queries:

Query Table

Query Table with Limit

Query Table with Limit and Offset (pagination)

Contributing

copy env variables - cp .env.example .env
set up background services - this repo contains a mock environment you can launch with bash run-mock-environment.sh or if connecting to an existing one edit the URLs in the .env file
install dependencies - yarn
yarn start to launch server / yarn test to run tests
setup automatic code formatter - install and use prettier, if using vscode install prettier-vscode extention

Please don't forget to add new variables to .env.example - they also will be used in ci tests.

GitHub CI

Tests

Tests and formating check and run automatically on every push and pull request to master. They run on Docker hub. See documentation here https://docs.docker.com/docker-hub/builds/automated-testing/

To simulate tests as running on dockerhub you can run bash run-tests-in-docker.sh file.

If a pull request has failed checks it shows an error message in GitHub. The link to DockerHub does not work though. You will need to navigate there:

Docker repository is here https://hub.docker.com/repository/docker/datopian/data-api
To see build jobs go to builds https://hub.docker.com/repository/docker/datopian/data-api/builds and find your build/test

Docker image builds

After every push to master or a pull request to this branch and successful tests there is a new docker image built here https://hub.docker.com/repository/docker/datopian/data-api/builds

Building

New images can be build with the provided Dockerfile or fetched from dockerhub.

Deployment

Can be deployed as a usual docker container. An environement for this microservice should contain:

postgres database
hasura
environment variables (see .env.example)

License

This project is licensed under the MIT License - see the LICENSE file for details

data-api's People

Contributors

Stargazers

Watchers

Forkers

isabella232 luccasmmg datopian

data-api's Issues

[epic] Support for large queries from data API

When querying the data API I want to be able to make queries and get results with 100k or 1m+ results and download them so that I can extract the data I want even if larger

Acceptance

Design the solution
- Consider authorization considerations
Write to storage approach
- Choose storage backend
- Stream to storage from data API (or trigger background job)
- Return download URL
Setup switch from "streaming" to write to storage

Analysis

There are 2 approaches:

Stream the whole result
Extract the query results to storage and give storage url to the user

One can also have hybrid e.g. do the former up to some number of results and then switch to 2.

There are several advantages of option 2:

You have a natural cache structure on disk so that the same query may not need to be recomputed (you can expire exported results after some time period)
If your download/stream is interrupted you can resume it from storage (rather than re-running the query)
You give the end user the option to share the file for a certain period of time with other users

The disadvantages (at least the small data sizes):

It's more complex / more work on backend
Slower (greater latency)
More complex for user: they have 2 steps where there was one (get query result, extract download url and download)

[epic] Read Service with CKAN v2 compatibility

We are using Hasura as the base wrapper around Postgres.

We need a backwards compatible API

# dapi-endpoint includes version e.g. data-api.ckan.com/v1/ ...
{dapi-endpoint}/{dapi-id}/graphql  # raw Hasura
{dapi-endpoint}/{dapi-id}/datastore_search  # CKAN DataStore /api/3/datastore_search

# let's see about this one (if hard we may skip for now)
{dapi-endpoint}/{dapi-id}/datastore_search_sql/  # CKAN DS SQL /api/3/datastore_search_sql

Terminology

dapi-id = table name in postgres (?) tbc
- NB: hasura does not allow - in table names ...

Acceptance

We have a new graphql endpoint
- You can get CSV and xlsx data from it (?)
We have backwards compatible datastore_search
We have backwards compatible datastore_search_sql
We support CORS
We have a hook for metrics (to google analytics)

BONUS

We implement existing permissions (what are they?)

Tasks

Analysis

graph TD

subgraph Read API service
  hasura[Hasura]
  apiwrapper[NodeJS wrapper app]
end

postgres[Postgres]

apiwrapper --> hasura
hasura --> postgres

CSV response format for graphql read API

When I have selected the data I want in the Build Report I want to be able to download in CSV format so that I can use it easily in my tools e.g. a spreadsheet, database etc

Acceptance

I can make a query to build report API and indicate that I want results returned in CSV
The results are returned in CSV

Tasks

Design the API structure
Add format query string to "graphql" endpoint with default to JSON
Stream response instead of simple response.
Implement CSV format option with conversion to CSV from JSON response
- Implement different separation formats TSV, |, ;,
- Tests

Analysis

Current

/.../graphql

POST request

{  query MyQuery {
    build_report_capacityauction {
      MonthlyAuctionConRentDKK_DK1
      UtilizedExportCapacity_DK1
    }
  }
}

New

POST with existing type of body to:

.../graphql?format=json|csv|xlsx

Options

New format:

Option 1:

POST /.../

Nicest:

.../graphql?format=json|csv|xlsx

Option 2:

{
  "format": "json | csv | xlsx",
  "graphql": "query MyQuery {
    build_report_capacityauction {
      MonthlyAuctionConRentDKK_DK1
      UtilizedExportCapacity_DK1
    }
  }"
}

PostgREST instead of hasura

I'll be happy to get your opinion on basing the data-api on PostgREST instead of hasura.
This post list some of the advantages: https://www.freecodecamp.org/news/stop-calling-postgrest-magic-8f3e1d5e5dd1/

It has a well thought out query language (similar to SQL)
- full text search
- renaming columns
- extracting nested fields from a json column (json_col->phones->0->number)
- computed columns
- sorting with null handling
- limits and pagination
- resource embedding: query columns from related tables (foreign keys) in a single API call (e.g. GET /films?select=title,directors(id,last_name))
response formats: csv, json, openapi, plain text, binary stream
can automatically expose OpenAPI (similar to exposing schema by GraphQL)
schema isolation: expose views and stored procedures (as abstraction layer) instead of exposing the real tables

I understand the flexibility offered by GraphQL but it might be less suitable for querying tabular data (e.g. efficiently return 1M rows as CSV)
This is partly because it uses a graph data model - while we can model a table as a graph, the framework is lacking basic tabular concepts "understanding" such as "rows" which might make it suboptimal for this kind of data

Also, I don't like the fact that with GraphQL, we can't use a simple browser for running a query (e.g. when collaborating with non-professional, I like to be able to send a query as a URL that will download a csv with the relevant data when opened in the browser)

Another relevant project - https://github.com/subzerocloud/postgrest-starter-kit, makes PostgREST even more powerful.

Data API Authentication and Authorization

Acceptance

Add simple authentication handling/validation (maybe JWT?)

Tasks

Analysis

Check also related issue #26
Refer to issue #7

Basic API wrapper around Hasura with endpoints and CKAN datastore API

Create a basic wrapper service around Hasura in NodeJS that provides CKAN datastore API style features.

Acceptance

We have an an API like this where dapi-id is probably the table or view name ...

graphql endpoint - maybe comes straight from Hasura
datastore_search endpoint

# dapi-endpoint includes version e.g. data-api.ckan.com/v1/ ...
/v1/graphql  # raw Hasura - the table name is part of graphql query
/v1/datastore_search  # CKAN DataStore /api/3/datastore_search

Docker compose setup (now we have 2 services)
CI for this ...

Acceptance criteria

The data-api app provides graphql (route to hasura) endpoint
The data-api app provides datastore_search endpoint

Tasks

GraphQL endpoint

write a test checking redirection
implement the redirection
write a test checking that response is streaming (not consolidating data) -> see issue #11

`datastore_search` as in Datastore extention

This endpoint response format is incompatible with GraphQL => will need it's own implementation. Also some features might be tricky to implement, e.g. Setting the plain flag to false enables the entire PostgreSQL full text search query language. involves setting up postgres indexes.

Other activities

Add simple authentication handling/validation (maybe JWT?) -> see issue #12
document what is different from https://docs.ckan.org/en/2.9/maintaining/datastore.html#ckanext.datastore.logic.action.datastore_search

return format . CSV, dict, json ...
Options for return format selection:

records_format in the query string (as in old datastore_search)
- backward compatible with current CKAN
- can be read by human user checking the query string
this/query/path/CSV/datastore_search
- easy to implement
- can be read by user in the URL
http header
- (probably) already implemented by the frameworks

Analysis

CKAN - New API Differences and Analysis

Here we document the differences between this new API and CKAN version

By default all parameters and results from the new API are JSON, this differs from CKAN in which the get call parameters are non standard but have an ad-hoc syntax. This is to

General Differences CKAN vs New API

The new API is entirely based on a GraphQL DB querying system, this changes the ways in which we can make queries.

CKAN targets a Postgresql DB, the difference with a GraphQL DB is that the queries that can be implemented are quite different.

GraphQL queries need to be explicit, including even the fields ("columns" in a traditional DB), which means knowing the database and "table" schema before making the query to be able to ask for the needed fields. This means that for

GET Parameters

This section discusses each CKAN parameter and its implementation (or not) in the new API;

Parameters:

resource_id (string) – id or alias of the resource to be searched against. Mandatory parameter, implemented

filters (dictionary) – matching conditions to select, e.g {“key1”: “a”, “key2”: “b”} (optional). Optional parameter, use q query instead in the New API.

q (string or dictionary) – full text query. If it’s a string, it’ll search on all fields on each row. If it’s a dictionary as {“key1”: “a”, “key2”: “b”}, it’ll search on each specific field (optional)

This field is different in the new API, the main difference is that the new API only receives JSON. Current New API implementationis equivalent to filters.

distinct (bool) – return only distinct rows (optional, default: false)

This parameter is vastly different from the previous CKAN implementation. In GraphQL there is no notion of row (due to the simple fact that graphs do not have rows), this means that the option distinct does not mean the same for the new API and that we can also implement a new idea.

Due to the differences and to make it evident that the API is not the same the new implementation is called distinct_on and needs a list of fields to test for differences, so the new implementation can check for differences for each field.

The New API does also implement (for backwards compatibility) a boolean value where it will query the graph schema and ask for different in every field of the schema, which would be equivalent to the CKAN distinct implementation.

plain (bool) – treat as plain text query (optional, default: true)
language (string) – language of the full text query (optional, default: english)

Full text search was discussed to not be implemented in the github issue

limit (int) – maximum number of rows to return (optional, default: 100, unless set in the site’s configuration ckan.datastore.search.rows_default, upper limit: 32000 unless set in site’s configuration ckan.datastore.search.rows_max)

There is no difference in the implementation of this parameter

offset (int) – offset this number of rows (optional)

This parameter that implies pagination to the response has not been completely implemented in the new API. The reason for this is that in GraphQL there are 2 different ways of implementing pagination and this needs to be analyzed more in depth.

fields (list or comma separated string) – fields to return (optional, default: all fields in original order)

List of fields to return to the caller. The only difference is that in the New API the input is a list (JSON)

sort (string) – comma separated field names with ordering e.g.: “fieldname1, fieldname2 desc”

Not included yet in the New API, and the only difference with the new api is that it will be implemented with a JSON input instead of a comma separated field names. The input parameter will have the following format:

{ fieldname1: order, fieldname2: order} where order = [asc|desc]

include_total (bool) – True to return total matching record count (optional, default: true)

Not Implemented in the New API, this is CPU intensive and needs an extra query or process the result to count the number of resulting elements.

total_estimation_threshold (int or None) – If “include_total” is True and “total_estimation_threshold” is not None and the estimated total (matching record count) is above the “total_estimation_threshold” then this datastore_search will return an estimate of the total, rather than a precise one. This is often good enough, and saves computationally expensive row counting for larger results (e.g. >100000 rows). The estimated total comes from the PostgreSQL table statistics, generated when Express Loader or DataPusher finishes a load, or by autovacuum. NB Currently estimation can’t be done if the user specifies ‘filters’ or ‘distinct’ options. (optional, default: None)

This is not feasible with the current graphql DB except for the global statistics on the schema. To be able to estimate the count we would need some other aggregation statistics on the different kind of queries.

records_format (controlled list) – the format for the records return value: ‘objects’ (default) list of {fieldname1: value1, …} dicts, ‘lists’ list of [value1, value2, …] lists, ‘csv’ string containing comma-separated values with no header, ‘tsv’ string containing tab-separated values with no header
Setting the plain flag to false enables the entire PostgreSQL full text search query language.

The result format in the New API is JSON, in the future and if needed another return format can be implemented

Results:

The result returned to the caller is a JSON containing the following:

{ 
    schema: {JSON schema definition},
    data: [JSON list of elements]

}

The following list shows the CKAN return elements and the implementation in the New API

fields (list of dictionaries) – fields/columns and their extra metadata
- This is not returned as in a JSON format it is not needed (it is already present in the return ) and the extra metadata is present in the schema field
offset (int) – query offset value
- Not currently implemented as a return value
limit (int) – queried limit value (if the requested limit was above the ckan.datastore.search.rows_max value then this response limit will be set to the value of ckan.datastore.search.rows_max)
- Not currently implemented as a return value
filters (list of dictionaries) – query filters
- Not currently implemented as a return value
total (int) – number of total matching records
- Not currently implemented as a return value. Needs more analysis and development to be implemented.
total_was_estimated (bool) – whether or not the total was estimated
- Will not implement
records (depends on records_format value passed) – list of matching results
- This one now is named data

Response Size

Depending on the response size the data transfer can have ill network effects or overloading the server connections. This is why asynchronous and streaming response data will be needed.

The fact that the current implementation responds in JSON by default adds a data overload that is simply not there in formats like CSV which do not repeat the field names for every response element.

Another option for big size response is to return a SFTP URL (or other secure file transfer protocol) instead. This URL will contain a file with the response data once the operation is complete. This way the connection can be freed and the client can poll for the file.

This solution has two extra advantages:

the file acts like a cache and can be used in intermediate computations
if there is any problem during the file transfer the result download can be restarted without recomputing the response and partial downloads are supported in most file transfer protocols.

Security

Security needs to be implemented, one option is JWT which allows for signed requests in the GET query.

JWT has the advantage of already being compatible with the current JSON parameter implementation in the New API.

Example 1

Input

/?resource_id=test_id

Test table is a table like this:

| _id | a | b | c |

Incoming query in structured format

{
  resource_id: test_table
}

Intermediate structure "JSON graqhql"

{
  table: test_table
  fields: [...]
}

Final graphql to send to client

{
  test_table {
    _id,
    a,
    b,
    c,
  }
}

Returns from Graphql

Returns to user

How do we deal with versioning?

While implementing Dockerhub CI I thought it would be nice to have different versions of the app docker image in the future. Here is the relevant discussion #4 (comment):

@EvgeniiaVak : So if we go with builds from tags, we would still need to create those tags first (manually?)...
@zelima : Yes. Usually, that's done manually. But think you can easily scrip it if you really want. Or maybe even github has some tools to do that for you

So how will we do that?

invest some time into versioning + github + dockerhub autotagging how-to (✔️ I would vote for this one, because it will be easier to support versioning at all)
create manually
1. tags in git
2. build setups for this tags in dockerhub (see screenshot)

I guess this is something for the future. cc @leomrocha @rufuspollock

Add test setup

epic here: #1

When developing data-api app I want to have test framework setup and ready to be able to write unit tests during development.

Acceptance criteria

There is a test framework in the repo with a smoke test written and passing
There is a mock framework in the repo used throughout tests

Tasks

Find the framework to use (Supertest remains the default for node http server tests ...)
Write a simple smoke test (e.g. something like / page returns some response)
Setup github CI to run the tests on PR to master branch
Create a PR to tests that it's working
Setup mocks / fixtures i.e. the fixture postgres database (For mocks we can either use nock (example of using nock here - https://github.com/datopian/frontend-v2) or mitm.js - interesting to give a try)

this should probably be after setting up graphql endpoint to see what kind of mocks we will need (and generate them easily)
- Write a test using a fixture (a simple one that tests functionality working out of the box)

XLSX response format for read API

When I have selected the data I want in the Build Report I want to be able to download in XLSX format so that I can use it easily in my tools e.g. Excel

Acceptance

I can make a query to build report API and indicate that I want results returned in xlsx (or excel?)
The results are returned in xlxs format

Tasks

Design the API structure - same as for CSV #5
Implement xlsx format option with conversion to xlsx from JSON response
- Basic test

Analysis:

See CSV related issue

Basic Hasura-based API as a service

Epic here #1

When developing data-api I want to be able to create and run integration tests of the app so that I can check how it's working together with postgres and hasura.

Acceptance

There is a docker-compose environment with the necessary cervices
There is ci running tests and publishing to a Docker repository

Tasks

Local developer docker-compose

add services to docker-compose
- postres (latest official postgres image)
- hasura (custom image with adding sample data)
write a basic smoke test checking the system parts are working
- Find the framework to use (Supertest remains the default for node http server tests ...)

CI

Inbox

Have Hasura based read API

Dockerize this service and auto publish on merge to master

Work out where we publish to (is it dockerhub or is it github packages? (Ask on tech help?)

Publish there with continuous deployment (on master)

Testing

Have basic test for this (using cypress (?))

CI setup and passing

Basic API optional parameters from CKAN datastore API

This is a follow up of issue #7

Acceptance

The data-api app provides the parameters
The data-api app provides response streaming

Tasks

Implement the following parameters

Analysis

The total(bool) count will not be implemented as currently needs another query to count the results or obtain all the results in a single page and count the objects. As currently performance is first, this count might be implemented in a later issue but will be excluded of this one.

Refer to issue #7

Support for column (field) ordering on the download file call

Currently the returned file during download does not allow for column sorting.

As a client, when I download a file I would like to be able to define the column(field) order on the downloaded files instead of just using the default one.

Acceptance:

I can sort the fields by giving a sorted list of fields

Tasks

TODO

Analysis

The js-xlsx library that we are currently using allows for sorting giving as input the ordered list of fields.

TODO. Finish implementation and task analysis.
TODO analyze if different ways of sorting are needed (alphabetically, key, just by input list, something else)

Data-API query limit and DOS protection

I've opened this issue to discuss about the limit that we are currently implementing and what should we do in the future. As this is a decision that affects all projects using data-api I'd like your input on it.

The relevant code is located here

The current implementation has a default limit (quite small) and is build such as build-report can get a sample data by default and alleviate the server of any unnecessary big call. This has the extra advantage of avoiding calls that accidentally have a big response. Having default response limits is an industry good practice but does not deal with all the issues.

Nevertheless there some things that need to be taken into account:

Databases often implement limits in row count, timeout and result size to avoid being overwhelmed by client's queries. Changing these is not often a solution to all the problems and the needed DB configuration vary depending on the application.

Being able to download a response as a file, in the current implementation a limit must be given that is bigger or equal than the query result. The question here is how should we deal with it and how should the behaviour be from the client's side point of view?

In the case when the read API is public we'll have to deal with DoS and DDoS attacks. Having no search limit makes these attacks easier, but even being able to set a limit does not eliminate the threat.

So the questions that I would like to discuss are:

What kind of behaviours can be observed from the client side? Specially for the File Download use case. What measures can we have to limit DoS attack vectors?

I'd like your opinions and discuss enough to take an informed decision on what we'll do here.

@rufuspollock @anuveyatsu @EvgeniiaVak @shubham-mahajan @sagargg

[epic] MVP Data API next gen

Tasks

v0.1: New Read API (even running against old DataStore DB ...?) #1
v0.2: Management API with new Data API storage and bridge
v0.3: Management UI
v0.4: embedding Explorer using new API in a Dataset page

Absolute min MVP that delivers most immediate value

Absolutely basic read API service #1
- Backwards compatible API
- Formats: CSV, XLXS etc
Explorer refactored -- may not be needed if we have backwards compatible API

[ci] Dockerhub tests fail inconsistently

Acceptance criteria

tests that run locally successfully should not fail in dockerhub runner

Tasks

find the reason
find solution
implement solution

Analysis

When removing most of the tests some are still passing (even those dependent on hasura and test_table) - https://hub.docker.com/repository/registry-1.docker.io/datopian/data-api/builds/43d760a1-f0fd-4df9-a7bb-6e73cd52d594

Locally tests fail sometimes too - hasura does not start immediately, we are using wait-for-it.sh script to wait for hasura, but probably need to wait for a specific hasura endpoint: - https://hasura.io/docs/1.0/graphql/core/api-reference/health.html#api-spec

Was able to recreate 4 passing, 4 failing scenario locally with this command:

docker-compose -f docker-compose.test.yml -p datopian-data-api-test up --exit-code-from sut --renew-anon-volumes --build

Added a custom wait-for-hasura.sh, and now it consistently fails with 1 test both locally with .test.yaml and in dockerhub:

7 passing (491ms)
1 failing
1) datastore_search endpoint
respond with default number of rows when requesting by resource_id on a table with more than the default rows:
Error: done() invoked with non-Error: response length is not correct
at Test.<anonymous> (test/integration/datastore_search.js:49:18)
at Test.assert (node_modules/supertest/lib/test.js:181:6)
at Server.localAssert (node_modules/supertest/lib/test.js:131:12)
at emitCloseNT (net.js:1654:8)
at processTicksAndRejections (internal/process/task_queues.js:83:21)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Removing bgtvjjfaugjk7lo2pguilop_sut_1 ...
�[1A�[2K Removing bgtvjjfaugjk7lo2pguilop_sut_1 ... �[32mdone�[0m �[1BGoing to remove bgtvjjfaugjk7lo2pguilop_sut_1
executing docker-compose.test.yml (1)

Also there is a warning in hasura logs:

graphql-engine_1  | {"timestamp":"2020-09-24T11:10:06.000+0000","level":"info","type":"startup","detail":{"kind":"migrations-startup","info":"failed waiting for 9691, try increasing HASURA_GRAPHQL_MIGRATIONS_SERVER_TIMEOUT (default: 30)"}}

The reason for the last test not passing was we missed a new env variable in the docker-compose.test.yml

Inbox

E.g. tests on this PR #13 different test runs have different number of tests running successfully:

7 pass, 1 fail: https://hub.docker.com/repository/registry-1.docker.io/datopian/data-api/builds/e2a3e474-28ec-4676-b2a7-bdfccda94b38

$ mocha -r dotenv/config --recursive
data-api
�[0mGET /non_existing_page �[33m404 �[0m4.528 ms - 10�[0m
✓ should return 404 for non existing page (46ms)
GraphQL endpoint
�[0mPOST /v1/graphql �[32m200 �[0m26.038 ms - -�[0m
✓ returns graphql schema
datastore_search endpoint
�[0mGET /v1/datastore_search?resource_id=test_table �[32m200 �[0m24.567 ms - 1621�[0m
✓ returns 200 in a basic case
�[0mGET /v1/datastore_search/help �[32m200 �[0m0.523 ms - 1001�[0m
✓ returns help page in a basic case
�[0mGET /v1/datastore_search �[36m303 �[0m1.612 ms - 51�[0m
✓ redirects to help if no resource_id
�[0mGET /v1/datastore_search?resource_id=test_table �[32m200 �[0m21.563 ms - 1621�[0m
1) respond with default number of rows when requesting by resource_id on a table with more than the default rows
q parameter
�[0mGET /v1/datastore_search?resource_id=test_table&q=%7B%22time_column%22:%222020-09-09%2000:00:00%22,%22text_column%22:%2211111111111111111111111111111111%22,%22float_column%22:0.1111111111111111,%22int_column%22:111111%7D �[32m200 �[0m336.367 ms - 349�[0m
✓ filters resultset by values equal when passing {"columnname": "value"} (341ms)
graphqlQueryBuilder
function buildParametrableQuery
✓ builds a result query with all possible query params
7 passing (506ms)
1 failing
1) datastore_search endpoint
respond with default number of rows when requesting by resource_id on a table with more than the default rows:
Error: done() invoked with non-Error: response length is not correct
at Test.<anonymous> (test/integration/datastore_search.js:49:18)
at Test.assert (node_modules/supertest/lib/test.js:181:6)
at Server.localAssert (node_modules/supertest/lib/test.js:131:12)
at emitCloseNT (net.js:1654:8)
at processTicksAndRejections (internal/process/task_queues.js:83:21)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Removing btmikobceywrqqzjbm4tmsr_sut_1 ...
�[1A�[2K Removing btmikobceywrqqzjbm4tmsr_sut_1 ... �[32mdone�[0m �[1BGoing to remove btmikobceywrqqzjbm4tmsr_sut_1
executing docker-compose.test.yml (1)

4 pass, 4 fail: https://hub.docker.com/repository/registry-1.docker.io/datopian/data-api/builds/fcffe6c5-34c0-465a-b550-40618954a9b9

$ mocha -r dotenv/config --recursive
data-api
�[0mGET /non_existing_page �[33m404 �[0m6.106 ms - 10�[0m
✓ should return 404 for non existing page (52ms)
GraphQL endpoint
�[0mPOST /v1/graphql �[31m500 �[0m21.010 ms - 10�[0m
1) returns graphql schema
datastore_search endpoint
FetchError: request to http://graphql-engine:8080/v1/graphql failed, reason: connect ECONNREFUSED 172.18.0.3:8080
at ClientRequest.<anonymous> (/usr/src/app/node_modules/node-fetch/lib/index.js:1461:11)
at ClientRequest.emit (events.js:315:20)
at Socket.socketErrorListener (_http_client.js:426:9)
at Socket.emit (events.js:315:20)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
type: 'system',
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED'
}
FetchError: request to http://graphql-engine:8080/v1/graphql failed, reason: connect ECONNREFUSED 172.18.0.3:8080
at ClientRequest.<anonymous> (/usr/src/app/node_modules/node-fetch/lib/index.js:1461:11)
at ClientRequest.emit (events.js:315:20)
at Socket.socketErrorListener (_http_client.js:426:9)
at Socket.emit (events.js:315:20)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
type: 'system',
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED'
}
2) returns 200 in a basic case
�[0mGET /v1/datastore_search/help �[32m200 �[0m0.706 ms - 1001�[0m
✓ returns help page in a basic case
�[0mGET /v1/datastore_search �[36m303 �[0m1.554 ms - 51�[0m
✓ redirects to help if no resource_id
FetchError: request to http://graphql-engine:8080/v1/graphql failed, reason: connect ECONNREFUSED 172.18.0.3:8080
at ClientRequest.<anonymous> (/usr/src/app/node_modules/node-fetch/lib/index.js:1461:11)
at ClientRequest.emit (events.js:315:20)
at Socket.socketErrorListener (_http_client.js:426:9)
at Socket.emit (events.js:315:20)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
type: 'system',
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED'
}
FetchError: request to http://graphql-engine:8080/v1/graphql failed, reason: connect ECONNREFUSED 172.18.0.3:8080
at ClientRequest.<anonymous> (/usr/src/app/node_modules/node-fetch/lib/index.js:1461:11)
at ClientRequest.emit (events.js:315:20)
at Socket.socketErrorListener (_http_client.js:426:9)
at Socket.emit (events.js:315:20)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
type: 'system',
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED'
}
3) respond with default number of rows when requesting by resource_id on a table with more than the default rows
q parameter
FetchError: request to http://graphql-engine:8080/v1/graphql failed, reason: connect ECONNREFUSED 172.18.0.3:8080
at ClientRequest.<anonymous> (/usr/src/app/node_modules/node-fetch/lib/index.js:1461:11)
at ClientRequest.emit (events.js:315:20)
at Socket.socketErrorListener (_http_client.js:426:9)
at Socket.emit (events.js:315:20)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
type: 'system',
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED'
}
FetchError: request to http://graphql-engine:8080/v1/graphql failed, reason: connect ECONNREFUSED 172.18.0.3:8080
at ClientRequest.<anonymous> (/usr/src/app/node_modules/node-fetch/lib/index.js:1461:11)
at ClientRequest.emit (events.js:315:20)
at Socket.socketErrorListener (_http_client.js:426:9)
at Socket.emit (events.js:315:20)
at emitErrorNT (internal/streams/destroy.js:92:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
at processTicksAndRejections (internal/process/task_queues.js:84:21) {
type: 'system',
errno: 'ECONNREFUSED',
code: 'ECONNREFUSED'
}
4) filters resultset by values equal when passing {"columnname": "value"}
graphqlQueryBuilder
function buildParametrableQuery
✓ builds a result query with all possible query params
4 passing (6s)
4 failing
1) GraphQL endpoint
returns graphql schema:
Error: expected 200 "OK", got 500 "Internal Server Error"
at Test._assertStatus (node_modules/supertest/lib/test.js:268:12)
at Test._assertFunction (node_modules/supertest/lib/test.js:283:11)
at Test.assert (node_modules/supertest/lib/test.js:173:18)
at Server.localAssert (node_modules/supertest/lib/test.js:131:12)
at emitCloseNT (net.js:1654:8)
at processTicksAndRejections (internal/process/task_queues.js:83:21)
2) datastore_search endpoint
returns 200 in a basic case:
Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/usr/src/app/test/integration/datastore_search.js)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
3) datastore_search endpoint
respond with default number of rows when requesting by resource_id on a table with more than the default rows:
Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/usr/src/app/test/integration/datastore_search.js)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
4) datastore_search endpoint
q parameter
filters resultset by values equal when passing {"columnname": "value"}:
Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/usr/src/app/test/integration/datastore_search.js)
at listOnTimeout (internal/timers.js:549:17)
at processTimers (internal/timers.js:492:7)
�[0mGET /v1/datastore_search?resource_id=test_table �[0m- �[0m- ms - -�[0m
�[0mGET /v1/datastore_search?resource_id=test_table �[0m- �[0m- ms - -�[0m
/usr/src/app/node_modules/mocha/lib/runner.js:906
throw err;
^
TypeError: Cannot read property 'body' of undefined
at Test.<anonymous> (/usr/src/app/test/integration/datastore_search.js:73:38)
at Test.assert (/usr/src/app/node_modules/supertest/lib/test.js:181:6)
at Server.localAssert (/usr/src/app/node_modules/supertest/lib/test.js:131:12)
at Object.onceWrapper (events.js:421:28)
at Server.emit (events.js:315:20)
at emitCloseNT (net.js:1654:8)
at processTicksAndRejections (internal/process/task_queues.js:83:21)
error Command failed with exit code 7.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Removing ba5dxw7nygi4fcxpe47x5cd_sut_1 ...
�[1A�[2K Removing ba5dxw7nygi4fcxpe47x5cd_sut_1 ... �[32mdone�[0m �[1BGoing to remove ba5dxw7nygi4fcxpe47x5cd_sut_1
executing docker-compose.test.yml (7)

Where as locally all 8 tests pass:

evgeniyavakarina (feature/datastore_search-v1.1 *) data-api $ yarn test
yarn run v1.22.4
$ mocha -r dotenv/config --recursive


  data-api
GET /non_existing_page 404 3.065 ms - 10
    ✓ should return 404 for non existing page

  GraphQL endpoint
POST /v1/graphql 200 50.605 ms - -
    ✓ returns graphql schema (56ms)

  datastore_search endpoint
GET /v1/datastore_search?resource_id=test_table 200 36.800 ms - 14329
    ✓ returns 200 in a basic case (40ms)
GET /v1/datastore_search/help 200 0.498 ms - 1001
    ✓ returns help page in a basic case
GET /v1/datastore_search 303 1.009 ms - 51
    ✓ redirects to help if no resource_id
GET /v1/datastore_search?resource_id=test_table 200 56.574 ms - 14329
    ✓ respond with default number of rows when requesting by resource_id on a table with more than the default rows (63ms)
    q parameter
GET /v1/datastore_search?resource_id=test_table&q=%7B%22time_column%22:%222020-09-09%2000:00:00%22,%22text_column%22:%2211111111111111111111111111111111%22,%22float_column%22:0.1111111111111111,%22int_column%22:111111%7D 200 161.018 ms - 349
      ✓ filters resultset by values equal when passing {"columnname": "value"} (165ms)

  graphqlQueryBuilder
    function buildParametrableQuery
      ✓ builds a result query with all possible query params


  8 passing (380ms)

✨  Done in 2.31s.

(Maybe) GraphQL playground as dynamic docs

When studying GraphQL with node-js I was able to explore the underlying graphql schema with the interactive documentation via GraphQL playground (based on GraphQL schema, all items are clickable there):

Hasura also has something like this via their console app, maybe it'd be possible to expose that as the API documentation:
https://hasura.io/learn/graphql/hasura/data-modelling/4-try-todos-queries/

So maybe to autogenerate API docs we could wire up one of those?

datopian / data-api Goto Github PK

data-api's Introduction

Data API

Features

Version

Usage

Endpoints

GraphQL Endpoint

datastore_search endpoint

Parameters

Results:

Examples

Help page

Basic query with limit

Query Table

Query Table with Limit

Query Table with Limit and Offset (pagination)

Contributing

GitHub CI

Tests

Docker image builds

Building

Deployment

License

data-api's People

Contributors

Stargazers

Watchers

Forkers

data-api's Issues

Acceptance

Analysis

Terminology

Acceptance

Tasks

Analysis

Acceptance

Tasks

Analysis

Current

New

Options

Acceptance

Tasks

Analysis

Acceptance

Acceptance criteria

Tasks

GraphQL endpoint

datastore_search as in Datastore extention

Other activities

Analysis

CKAN - New API Differences and Analysis

General Differences CKAN vs New API

GET Parameters

Parameters:

Results:

Response Size

Security

Example 1

Input

Incoming query in structured format

Intermediate structure "JSON graqhql"

Final graphql to send to client

Returns from Graphql

Returns to user

Acceptance criteria

Tasks

Acceptance

Tasks

Analysis:

Acceptance

Tasks

Local developer docker-compose

CI

Inbox

Acceptance

Tasks

Analysis

Acceptance:

`datastore_search` as in Datastore extention