developmentseed / cogeo-mosaic-tiler Goto Github PK

View Code? Open in Web Editor NEW

11.0 7.0 6.0 10.62 MB

Serve Map tile from Cloud Optimized GeoTIFF mosaics.

License: MIT License

Dockerfile 0.25% Makefile 0.28% Shell 3.15% Python 55.06% HTML 41.26%

cogeo-mosaic-tiler's Introduction

This repo has been archived, checkout our new solution: https://github.com/developmentseed/titiler

cogeo-mosaic-tiler

Serve Map tile from Cloud Optimized GeoTIFF mosaics based on mosaicJSON.

Read the official announcement https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df

Deploy

Package Lambda

Create package.zip

$ make package

Deploy to AWS

This project uses Serverless to manage deploy on AWS.

# Install and Configure serverless (https://serverless.com/framework/docs/providers/aws/guide/credentials/)
$ npm install serverless -g 

$ sls deploy --region us-east-1 --bucket a-bucket-where-you-store-data

Docs

See /doc/API.md for the documentation.

Live

A version of this stack is deployed on AWS us-east-1 and available on mosaic.cogeo.xyz

Contribution & Development

Issues and pull requests are more than welcome.

Dev install & Pull-Request

$ git clone http://github.com/developmentseed/cogeo-mosaic-tiler.git
$ cd cogeo-mosaic
$ pip install -e .[dev]

Python >=3.6 only

This repo is set to use pre-commit to run flake8, pydocstring and black ("uncompromising Python code formatter") when committing new code.

$ pre-commit install
$ git add .
$ git commit -m'my change'
black....................................................................Passed
Flake8...................................................................Passed
Verifying PEP257 Compliance..............................................Passed
$ git push origin

About

Created by Development Seed

cogeo-mosaic-tiler's People

Contributors

Stargazers

Watchers

Forkers

bekerov kylebarron c-core-labs cuulee hong-brother isabella232

cogeo-mosaic-tiler's Issues

Use with mission-specific tilers

I know you currently have awspds-mosaic as a fork of cogeo-mosaic-tiler, but I think it's worth considering integrating support for mission-specific tilers inside cogeo-mosaic-tiler.

It would allow you to maintain a single repository and users who want to tile arbitrary COGs, CBERS, and Landsat, etc can use a single lambda function.

Thoughts:

There would need to be a way to determine from the mosaicJSON which rio-tiler tiling function to use, so that arbitrary URLs use the default tiler, landsat scene id's use the landsat tiler, cbers scenes use the cbers tiler, etc.

MosaicJSON top-level mission key: something like

 // Optional. A string indicating that the values of `tiles` represent scene ids of the designated mission instead of fully-qualified URLs.
 "mission": "landsat8"

in which case every value of every quadkey is interpreted as a a scene id

Quadkey-level prefixes: in order to have the tile reader vary across tiles, you could have quadkey-level prefixes. The current spec is quite flexible about allowing either a URL or a scene id:

// REQUIRED. A dictionary of per quadkey dataset in form of {quadkeys: [datasets]} pairs.
// Keys MUST be valid quadkeys index with zoom level equal to mosaic `minzoom` (or `quadkey_zoom` if present).
// Values MUST be arrays of strings (url or sceneid) pointing to a 
// Cloud Optimized dataset with bounds intersecting with the quadkey bounds.
"tiles": {
    "030130": [
        "s3://my-bucket/dir/file1.tif",
        "s3://my-bucket/dir/file2.tif",
    ]
}

A mosaic currently created with awspds-mosaic has sceneids of the form:

"0231120": ["LC08_L1TP_029035_20160720_20180131_01_T1", "LC08_L1TP_029034_20160720_20180131_01_T1", "LC08_L1TP_029036_20130610_20170310_01_T1"]

Other than attempting string matching against the scene ID, there's no way to know these correspond to Landsat scenes.

Instead valid landsat scenes could be something like:

"0231120": ["s3://landsat-pds/LC08_L1TP_029035_20160720_20180131_01_T1", "s3://landsat-pds/LC08_L1TP_029034_20160720_20180131_01_T1", "s3://landsat-pds/LC08_L1TP_029036_20130610_20170310_01_T1"]

where any path starting with s3://landsat-pds/ is interpreted as a prefix for a scene id, and the rest of the url is interpreted as one.

This could have issues, however, if someone ever specifies a fully qualified path starting with s3://landsat-pds all the way to a COG asset. To prevent that, you could use a URL scheme like landsat://<scene_id>, cbers://<scene_id etc

@2x ignored

Note: @vincentsarago before you debug anything let me make sure I didn't accidentally modify important code while profiling.

As I mentioned on Slack, since switching back to GDAL 2.4, my mosaic got blurrier:

It turns out that the blurriness is actually because I told Mapbox GL JS that the tilesize was 512, while it's actually 256. Correctly telling Mapbox GL JS that the tilesize is 256 makes it acceptable:

Still, I'd prefer to actually serve 512x512 high res tiles. You can see from these two links that both @2x.jpg and @2x.png
https://naip-lambda.kylebarron.dev/7610d6d77fca346802fb21b89668cb12ef3162a31eb71734a8aaf5de/15/5241/[email protected]?color_ops=sigmoidal+RGB+4+0.5%2C+saturation+1.25
https://naip-lambda.kylebarron.dev/7610d6d77fca346802fb21b89668cb12ef3162a31eb71734a8aaf5de/15/5241/[email protected]?color_ops=sigmoidal+RGB+4+0.5%2C+saturation+1.25

So I need to debug further where the tilesize isn't being passed on correctly

Proposal for faster tile merging

The following is a description of a possible bottleneck and a proposal which, if I'm right, could shave 9 seconds off of tile merging in my described use case.

I've spent a lot of the evening reading cogeo*/tiler* code, and I think I may have discovered the largest current bottleneck, at least with my dataset: a full mercator tile's data image bitmap and mask is created for each asset, regardless of the amount of overlap.

I think that's a big performance hit when you have several underlying images. In my mosaic_tiler profiling towards the bottom of this post, you can see that the biggest time hit is vrt.read, i.e. creating the data arrays. It's ~1.8s per asset, regardless of the amount of overlap.

Here's an example of where that really hurts. For a web mercator tile at zoom 12 covering West Hollywood (blue), there are six assets that are required to load (brown). Despite using a simpler pixel selection method, i.e. FirstMethod, all 6 assets need to be loaded and parsed, even though only 1 covers a majority of the tile.

Note, however, that since there's essentially zero overlap between assets, if rasterio windows could be used to only read the subset of the tile with valid source , and if vrt.read is linear with the amount of pixels generated, then you could potentially save 9 seconds, or 83% faster on this tile load. (I.e. there are currently 5 extra sets of 512x512 vrt.read operations beyond the one that's needed, and 1.8s * 5 = 9.)

6 underlying assets is the median for my MosaicJSON, so this is a very common occurrence. Here's the distribution of how many assets are in each quadkey:

count    142516.000000
mean          5.508160
min           1.000000
25%           4.000000
50%           6.000000
75%           6.000000
max          18.000000

Proposal

The slowest part of mosaic_tiler is creating data arrays that line up with the mercator tile for each underlying asset. I propose to use rasterio's windows to create a data array and mask using the minimal bounding box of valid source data within the mercator tile of interest.

Take my intial contrived/sketched example above, where an asset's bounds overlap just the top left of the mercator tile. In that case, since the overlap is rectangular in mercator coordinates, the window could read just the overlapping asset data and return an object like the following:

data: a numpy ndarray of size 3 x overlapping mercator width x overlapping mercator height. Pixel values for each band of only the parts of the mercator tile where source data exists.
mask: a numpy ndarray of size 1 x overlapping mercator width x overlapping mercator height. In the case where the overlap is rectangular in mercator coordinates, this would be entirely True.
mercator_bounds: [0, 400, 100, 512]. The lower left and upper right corners in mercator tile coordinates. This would be used in the pixel merging code once each tile's data has loaded.

I believe finding this intersection of valid data would be possible by intersecting the georeferenced bounds of the asset with the bounds of the mercator tile.

In the more general case (e.g. of Landsat data), where the intersection of the image scene and the mercator tile is not rectangular in mercator coordinates, you could still take the minimal intersecting bounding box in mercator coordinates. In this case, the mask is used to filter out the pixels without source data.

In addition to potentially taking much less time, it would take much less memory and could be run on a smaller lambda instance.

Code to update

Update _tile_read in rio_tiler.utils with the option to return a rectangular subset of the desired mercator tile.
Update tile in rio_tiler.main to find the intersection between the bounds of asset and the bounds of the mercator tile, and pass that to _tile_read. Note that cogeo-tiler-mosaic would also need those bounds returned from the function, so for backwards compatibility, creating a separate function might be better.
Update rio-tiler-mosaic's pixel selection methods to work with data arrays of subsets of the tile, instead of full arrays.

I'd be happy to submit PR's for these, because if I'm right it could make cogeo-mosaic-tiler really fast.

Profiling

Profiling is done using AWS X-Ray, inserting custom timing sections. I changed mosaic_tiler to run single-threaded for profiling, but it ran about the same speed as when it was using ThreadPoolExecutor.

Here's a profile of mosaic_tiler running for a mercator tile with four assets. Each asset takes a total of 2.5 seconds to load, with ~1.8 seconds of that just in vrt.read here.

Return minified json from app endpoints

Several app endpoints return JSON. It might be nice to return minified JSON, for a small improvement in network bandwidth. It consists of literally just replacing:

json.dumps(dict)

with

json.dumps(dict, separators=(',', ':'))

use EFS to store the mosaicjson ?

https://docs.aws.amazon.com/lambda/latest/dg/services-efs.html

update pre-commit and use isort

Performance advice?

Hey again,

I'm working on figuring out how to profile the lambda function, but I wanted to also ask if you had suggestions for improving performance, e.g. fastest image file format, mosaicJSON setup, post processing options? I'm getting averages of 12-15 seconds for requests to NAIP imagery (using first pixel selection method), and I'd love to see if I can bring that down a bit.

A big performance boost seems to come from removing @2x (unsurprisingly). Removing @2x and setting the output format to jpg gives me ~2-3 second response times for the Landsat endpoint (with a pregenerated mosaicJSON), which I'm happy with, though NAIP times are still slower.

Coercing float to float?

Here you're coercing a float to float...

cogeo-mosaic-tiler/cogeo_mosaic_tiler/handlers/app.py

Lines 97 to 99 in cf05871

    
           min_tile_cover = ( 
        
               float(min_tile_cover) if isinstance(min_tile_cover, float) else min_tile_cover 
        
           )

Do you mean

min_tile_cover = ( 
     float(min_tile_cover) if min_tile_cover is not None else min_tile_cover 
)

HTTP API Gateway

In December, AWS announced HTTP API Gateway endpoints, which appear to be 70% cheaper than the existing REST endpoint type and slightly faster.

I was wondering if there was a specific reason you use the REST endpoint type instead of the newer HTTP type in your serverless.yml?

It looks like you can use HTTP in serverless.yml with the httpApi key. The full spec appears to be here. It looks like the syntax is slightly different than the http key. I tried it out for a couple endpoints successfully, but some internal paths get messed up, like trying to load /docs gives an error:

Failed to load API definition:
undefined /docs/docs/openapi.json

mvtEncoder as optional import

I think including mvtEncoder makes the dependency bundle a bit larger? Because it has to include vtzero and maybe a couple other packages...?

The mvt encoding is probably not used by all users, so having it be an optional import and optional dependency would be great.

You could force install with

pip install cogeo-mosaic-tiler[mvt]

`/info` will not work with DynamoDB