Giter Site home page Giter Site logo

covid-wb-api's Introduction

covid-wb-api

COVID-19 Risk Schema API for the World Bank

Running the API locally

The API requires a PostgreSQL/PostGIS Database as well which must be set up separately. An RDS instance can be used which can be populated using the tools in the etl directory.

  • Setup docker
  • Clone this repo git clone https://github.com/developmentseed/covid-wb-api.git
  • Build docker image docker build -t covid-wb-api .
  • Create a new environment by copying .env-sample to .env and changing the variables to point to your PostgreSQL Instance.
  • Run docker run --env-file ./.env -p 8080:80 covid-wb-api
  • Now visit http://localhost:8080 to see the API

CI

CircleCI builds a new image and tags it with latest, branch name and the unique circle build number. This is pushed to AWS ECR for this project. For deploying, see cloudformation/README.md.

USAGE

This is using an instance that may not be currently running.

OpenAPI documentation for all API endpoints is available at

http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/

Available layers are:

adm0, adm0_full, adm1, adm1_full, adm2, adm2_full, hd_urban_fishnets, hd_urban_fishnets_full, urban_areas, urban_areas_full, urban_areas_hd, urban_areas_hd_full, urban_fishnets, urban_fishnets_full

The difference with the *_full is that those layers include all the attributes from the tables that you provided in addition to the attributes included in the original geojson files.

Basic usage to get just the attributes for a single feature using either the geohash or the ogc_fid http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/info/{layer}/{geohash or ogc_fid}

So to get all the fields available for the ADM 1 feature with geohash=sh5rcxz20wjh you would use:

http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/info/adm1/sh5rcxz20wjh

You can limit the columns returned by selecting just a list of the columns with columns=wb_adm0_na,wb_adm1_na,geohash,ogc_fid

http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/info/adm1_full/sh5rcxz20wjh?columns=lc_20,lc_100,ogc_fid,geohash

You can select the id to query by with keycol={ogc_fid, geohash, wb_adm0_co, wb_adm1_co, wb_adm2_co, objectid (pick one)}. If it is not a primary column for the layer you selected it will return multiple results.

So the above could also be done using a get request with the admin code:

http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/info/adm1/381?keycol=wb_adm1_co&columns=lc_20,lc_100,ogc_fid,geohash,wb_adm1_co,wb_adm1_na

When multiple results are returned you can use reportkey= to specify the key that is used as the index for the returned json object.

OpenAPI docs -> http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/#/Vector%20Tile%20API/feature_info_vector_info__table___id__get

Example to get all the adm1 attributes selected by the adm0 code: http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/info/adm1/1?keycol=wb_adm0_co&reportkey=wb_adm1_co

The vector tile service is accessed by entering either the xyz pattern or tilejson endpoint for the layer into a client that can use vector tiles (latest QGIS, OpenLayers, MapboxGL,...).

Example to get the TileJson config for adm0:

http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/adm0.json

Using the "base" layer name (ie adm0) will only provide the attributes provided with the geojson. Using the *_full (ie adm0_full) layername will include all attributes with the vector tiles -- if using the _full layer, you should use a columns filter otherwise it will be a huge amount of data in the return.

There is a basic vector tile viewer endpoint as well that can be accessed at the /vector/demo/{table} endpoint.

To see all the adm0 features:

http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/vector/demo/adm1/

You can click on the features to see the attributes that are included.

Metadata for the fields returned is simply a return of the json file that you provided.

http://covid-publi-1onc9lx0j49x6-1338300620.us-east-1.elb.amazonaws.com/RiskSchema.json

covid-wb-api's People

Contributors

bitner avatar geohacker avatar vincentsarago avatar

Watchers

 avatar  avatar  avatar

Forkers

isabella232

covid-wb-api's Issues

Data ingestion

Prepare ETL for ingesting the datasets. Ideally, we'd structure them in a way that can be run with remote access to the database and also keeping in mind future additions and updates of datasets.

Add titiler to Dockerfile and ECR image

titiler renders WebMercatorQuad raster tiles sourced from COGs. It is built with fastapi so it should integrate cleanly into our tech stack similarly to how @bitner already did timvt.

Note: titiler supports cdk deployment, for both a lambda setup, and a ECS setup. In the ECS setup it appears that basically adds autoscaling to a container task. If we need to later on, we can add an ECS autoscaler to our cloudformation.yaml.

cc @geohacker @bitner

Update Database Configurations in Cloud Formation

  • Add storage, 5GB is likely way short. Maybe bump up to 15GB to start.
  • When we deploy this, we will likely want at least db.t3.medium
  • Change target Postgres Version to 12.3 -- there are some enhancements in PostgreSQL/PostGIS that will in particular make st_asmvt much much faster for use with TiMVT
  • Add some developer IP's to the security groups so we can directly access and possibly load data from local machines rather than only through ECS (my IP 68.168.188.7 needs TCP access to the security group for port 5432 (postgres))

COG requests need rescale param (PNG driver doesn't support data type Float64)

Over in #22 (comment), @guidorice and I were able to confirm that wp2020_vulnerability_map.tif, wp_2020_1km.tif and wp_2020_1km_urban_pop.tif are causing a 500 from titiler.

For example, if you do curl http://covid-publi-131wmiy217ice-1414663655.us-east-1.elb.amazonaws.com/cog/wp_2020_1km/tiles/13/5865/3796.png this returns Internal Server Error. In the cloudwatch logs, I can see the following error:


2020-07-09T14:04:54.875+05:30 | DEBUG:rasterio._io:Path: UnparsedPath(path='/vsimem/a154a60a-1b3a-4484-8430-ba420692b1f9/a154a60a-1b3a-4484-8430-ba420692b1f9.'), mode: w+, driver: PNG
-- | --
  | 2020-07-09T14:04:54.875+05:30 | DEBUG:rasterio._base:Nodata success: 0, Nodata value: 0.000000
  | 2020-07-09T14:04:54.875+05:30 | DEBUG:rasterio._base:Nodata success: 0, Nodata value: 0.000000
  | 2020-07-09T14:04:54.876+05:30 | DEBUG:rasterio._io:Skipped delete for overwrite. Dataset does not exist: /vsimem/a154a60a-1b3a-4484-8430-ba420692b1f9/a154a60a-1b3a-4484-8430-ba420692b1f9.
  | 2020-07-09T14:04:54.876+05:30 | DEBUG:rasterio._io:Option: ('ZLEVEL', b'6')
  | 2020-07-09T14:04:54.876+05:30 | DEBUG:rasterio.env:Exiting env context: <rasterio.env.Env object at 0x7ff0f84b8040>
  | 2020-07-09T14:04:54.876+05:30 | DEBUG:rasterio.env:Cleared existing <rasterio._env.GDALEnv object at 0x7ff0f84b88e0> options
  | 2020-07-09T14:04:54.876+05:30 | DEBUG:rasterio._env:Stopped GDALEnv <rasterio._env.GDALEnv object at 0x7ff0f84b88e0>.
  | 2020-07-09T14:04:54.877+05:30 | DEBUG:rasterio.env:Exiting outermost env
  | 2020-07-09T14:04:54.877+05:30 | DEBUG:rasterio.env:Exited env context: <rasterio.env.Env object at 
...
....
....
  | 2020-07-09T14:04:54.877+05:30 | return await dependant.call(**values)
  | 2020-07-09T14:04:54.877+05:30 | File "/covidwb/app/routers/titiler_router.py", line 207, in cog_tile
  | 2020-07-09T14:04:54.877+05:30 | content = render(
  | 2020-07-09T14:04:54.877+05:30 | File "/usr/local/lib/python3.8/dist-packages/rio_tiler/utils.py", line 418, in render
  | 2020-07-09T14:04:54.877+05:30 | dst.write(mask.astype(tile.dtype), indexes=count + 1)
  | 2020-07-09T14:04:54.877+05:30 | File "rasterio/_base.pyx", line 332, in rasterio._base.DatasetBase.__exit__
  | 2020-07-09T14:04:54.877+05:30 | File "rasterio/_base.pyx", line 322, in rasterio._base.DatasetBase.close
  | 2020-07-09T14:04:54.877+05:30 | File "rasterio/_io.pyx", line 2077, in rasterio._io.BufferedDatasetWriterBase.stop
  | 2020-07-09T14:04:54.877+05:30 | File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
  | 2020-07-09T14:04:54.877+05:30 | rasterio._err.CPLE_NotSupportedError: PNG driver doesn't support data type Float64. Only eight bit (Byte) and sixteen bit (UInt16) bands supported.

cc @bitner @pieschker

Setup pygeoapi

We could probably start with setting it up as a Docker image.

Cloudformation

Create a cloudformation based deployment process with an ECS task, RDS and a load balancer.

Update Landcover raster layer to use publicly available mosaic

The LC.tif for each country is mosiac-ed and transformed into a COG here. A colormap is also applied to the Tif. The byte values in the Tifs do not exactly match up with the legend for the Globcover product, and that's noted in a couple places in TODOs.

I suggested, and Benjamin Stewart from WB gave approval via email, to instead use the premade Globcover Tif that is available here: http://due.esrin.esa.int/page_globcover.php

  • yank the code in the etl scripts
  • remove TODO in code about fixing the legend or byte values for the product
  • convert the Globcover mosaic to COG
  • upload to s3 and confirm it works with our raster tiling endpoints

Improvements for cfn

@guidorice found some things we should improve in the cfn template over in #17

  • Avoid resource name collisions by adding the stackname - Name: !Join ['-', [!Ref 'AWS::StackName', 'my-resource-name']]
  • GatewayAttachement is misspelled

Did you consider maybe adding the s3 bucket to the CF template? I realize this would complicate the deployment by having to copy data around, but the benefit would be not having to manually create a bucket for a new deployment. Just wanted to get your thoughts on that.

@guidorice i prefer to not make the S3 bucket part of the stack for now because it should contain all the data and would generally be created before hand. I don't suspect us creating a new bucket for every new stack.

tune ECS task resources & change gunicorn launch settings

In working on the titiler raster issues (#25) I am noticing intermittent 5xx errors when hitting titler endpoints, and @vincentsarago suggested one possibility is an out of memory condition because of this: tiangolo/fastapi#596

This task has a few parts:

  • Increase the ECS memory/cpu for tasks to something reasonable (right now it's a bare minimum). Determine what is reasonable based on having essentially 3 packages combined into one app: titiler, timvt and pygeoapi.
  • Do some testing against the titiler XYZ raster endpoints to generate some load. This could be done with qgis, or leaflet or mapbox-gl-js. Look at AWS to see if memory is growing unbounded. I do not know how to discovery memory and cpu usage for ECS tasks like this. I did some googling, looked in CloudWatch and in the ECS cluster and tasks in the AWS console, but I am not experienced with this. It looks like metrics might not be getting created. The task container seems to be a bit of a "black box".
  • Decide if we are are going to limit the gunicorn workers to what is recommended in the above issue. Make the change in the entrypoint.sh and update cloudformation stack with new image for the task .

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.