Giter Site home page Giter Site logo

cogeo-mosaic-tiler's Introduction

This repo has been archived, checkout our new solution: https://github.com/developmentseed/titiler

cogeo-mosaic-tiler

Serve Map tile from Cloud Optimized GeoTIFF mosaics based on mosaicJSON.

CircleCI codecov

Read the official announcement https://medium.com/devseed/cog-talk-part-2-mosaics-bbbf474e66df

Deploy

Package Lambda

Create package.zip

$ make package

Deploy to AWS

This project uses Serverless to manage deploy on AWS.

# Install and Configure serverless (https://serverless.com/framework/docs/providers/aws/guide/credentials/)
$ npm install serverless -g 

$ sls deploy --region us-east-1 --bucket a-bucket-where-you-store-data

Docs

See /doc/API.md for the documentation.

Live

A version of this stack is deployed on AWS us-east-1 and available on mosaic.cogeo.xyz

Contribution & Development

Issues and pull requests are more than welcome.

Dev install & Pull-Request

$ git clone http://github.com/developmentseed/cogeo-mosaic-tiler.git
$ cd cogeo-mosaic
$ pip install -e .[dev]

Python >=3.6 only

This repo is set to use pre-commit to run flake8, pydocstring and black ("uncompromising Python code formatter") when committing new code.

$ pre-commit install
$ git add .
$ git commit -m'my change'
black....................................................................Passed
Flake8...................................................................Passed
Verifying PEP257 Compliance..............................................Passed
$ git push origin

About

Created by Development Seed

cogeo-mosaic-tiler's People

Contributors

kylebarron avatar vincentsarago avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cogeo-mosaic-tiler's Issues

Use with mission-specific tilers

I know you currently have awspds-mosaic as a fork of cogeo-mosaic-tiler, but I think it's worth considering integrating support for mission-specific tilers inside cogeo-mosaic-tiler.

It would allow you to maintain a single repository and users who want to tile arbitrary COGs, CBERS, and Landsat, etc can use a single lambda function.

Thoughts:

There would need to be a way to determine from the mosaicJSON which rio-tiler tiling function to use, so that arbitrary URLs use the default tiler, landsat scene id's use the landsat tiler, cbers scenes use the cbers tiler, etc.

  • MosaicJSON top-level mission key: something like

     // Optional. A string indicating that the values of `tiles` represent scene ids of the designated mission instead of fully-qualified URLs.
     "mission": "landsat8"
    

    in which case every value of every quadkey is interpreted as a a scene id

  • Quadkey-level prefixes: in order to have the tile reader vary across tiles, you could have quadkey-level prefixes. The current spec is quite flexible about allowing either a URL or a scene id:

    // REQUIRED. A dictionary of per quadkey dataset in form of {quadkeys: [datasets]} pairs.
    // Keys MUST be valid quadkeys index with zoom level equal to mosaic `minzoom` (or `quadkey_zoom` if present).
    // Values MUST be arrays of strings (url or sceneid) pointing to a 
    // Cloud Optimized dataset with bounds intersecting with the quadkey bounds.
    "tiles": {
        "030130": [
            "s3://my-bucket/dir/file1.tif",
            "s3://my-bucket/dir/file2.tif",
        ]
    }
    

    A mosaic currently created with awspds-mosaic has sceneids of the form:

    "0231120": ["LC08_L1TP_029035_20160720_20180131_01_T1", "LC08_L1TP_029034_20160720_20180131_01_T1", "LC08_L1TP_029036_20130610_20170310_01_T1"]

    Other than attempting string matching against the scene ID, there's no way to know these correspond to Landsat scenes.

    Instead valid landsat scenes could be something like:

    "0231120": ["s3://landsat-pds/LC08_L1TP_029035_20160720_20180131_01_T1", "s3://landsat-pds/LC08_L1TP_029034_20160720_20180131_01_T1", "s3://landsat-pds/LC08_L1TP_029036_20130610_20170310_01_T1"]

    where any path starting with s3://landsat-pds/ is interpreted as a prefix for a scene id, and the rest of the url is interpreted as one.

    This could have issues, however, if someone ever specifies a fully qualified path starting with s3://landsat-pds all the way to a COG asset. To prevent that, you could use a URL scheme like landsat://<scene_id>, cbers://<scene_id etc

@2x ignored

Note: @vincentsarago before you debug anything let me make sure I didn't accidentally modify important code while profiling.

As I mentioned on Slack, since switching back to GDAL 2.4, my mosaic got blurrier:
image

It turns out that the blurriness is actually because I told Mapbox GL JS that the tilesize was 512, while it's actually 256. Correctly telling Mapbox GL JS that the tilesize is 256 makes it acceptable:
image

Still, I'd prefer to actually serve 512x512 high res tiles. You can see from these two links that both @2x.jpg and @2x.png
https://naip-lambda.kylebarron.dev/7610d6d77fca346802fb21b89668cb12ef3162a31eb71734a8aaf5de/15/5241/[email protected]?color_ops=sigmoidal+RGB+4+0.5%2C+saturation+1.25
https://naip-lambda.kylebarron.dev/7610d6d77fca346802fb21b89668cb12ef3162a31eb71734a8aaf5de/15/5241/[email protected]?color_ops=sigmoidal+RGB+4+0.5%2C+saturation+1.25

So I need to debug further where the tilesize isn't being passed on correctly

Proposal for faster tile merging

The following is a description of a possible bottleneck and a proposal which, if I'm right, could shave 9 seconds off of tile merging in my described use case.

I've spent a lot of the evening reading cogeo*/tiler* code, and I think I may have discovered the largest current bottleneck, at least with my dataset: a full mercator tile's data image bitmap and mask is created for each asset, regardless of the amount of overlap.

I think that's a big performance hit when you have several underlying images. In my mosaic_tiler profiling towards the bottom of this post, you can see that the biggest time hit is vrt.read, i.e. creating the data arrays. It's ~1.8s per asset, regardless of the amount of overlap.

Here's an example of where that really hurts. For a web mercator tile at zoom 12 covering West Hollywood (blue), there are six assets that are required to load (brown). Despite using a simpler pixel selection method, i.e. FirstMethod, all 6 assets need to be loaded and parsed, even though only 1 covers a majority of the tile.

image

Note, however, that since there's essentially zero overlap between assets, if rasterio windows could be used to only read the subset of the tile with valid source , and if vrt.read is linear with the amount of pixels generated, then you could potentially save 9 seconds, or 83% faster on this tile load. (I.e. there are currently 5 extra sets of 512x512 vrt.read operations beyond the one that's needed, and 1.8s * 5 = 9.)

6 underlying assets is the median for my MosaicJSON, so this is a very common occurrence. Here's the distribution of how many assets are in each quadkey:

count    142516.000000
mean          5.508160
min           1.000000
25%           4.000000
50%           6.000000
75%           6.000000
max          18.000000

Proposal

The slowest part of mosaic_tiler is creating data arrays that line up with the mercator tile for each underlying asset. I propose to use rasterio's windows to create a data array and mask using the minimal bounding box of valid source data within the mercator tile of interest.

Take my intial contrived/sketched example above, where an asset's bounds overlap just the top left of the mercator tile. In that case, since the overlap is rectangular in mercator coordinates, the window could read just the overlapping asset data and return an object like the following:

  • data: a numpy ndarray of size 3 x overlapping mercator width x overlapping mercator height. Pixel values for each band of only the parts of the mercator tile where source data exists.
  • mask: a numpy ndarray of size 1 x overlapping mercator width x overlapping mercator height. In the case where the overlap is rectangular in mercator coordinates, this would be entirely True.
  • mercator_bounds: [0, 400, 100, 512]. The lower left and upper right corners in mercator tile coordinates. This would be used in the pixel merging code once each tile's data has loaded.

I believe finding this intersection of valid data would be possible by intersecting the georeferenced bounds of the asset with the bounds of the mercator tile.

In the more general case (e.g. of Landsat data), where the intersection of the image scene and the mercator tile is not rectangular in mercator coordinates, you could still take the minimal intersecting bounding box in mercator coordinates. In this case, the mask is used to filter out the pixels without source data.

In addition to potentially taking much less time, it would take much less memory and could be run on a smaller lambda instance.

Code to update

  • Update _tile_read in rio_tiler.utils with the option to return a rectangular subset of the desired mercator tile.
  • Update tile in rio_tiler.main to find the intersection between the bounds of asset and the bounds of the mercator tile, and pass that to _tile_read. Note that cogeo-tiler-mosaic would also need those bounds returned from the function, so for backwards compatibility, creating a separate function might be better.
  • Update rio-tiler-mosaic's pixel selection methods to work with data arrays of subsets of the tile, instead of full arrays.

I'd be happy to submit PR's for these, because if I'm right it could make cogeo-mosaic-tiler really fast.

Profiling

Profiling is done using AWS X-Ray, inserting custom timing sections. I changed mosaic_tiler to run single-threaded for profiling, but it ran about the same speed as when it was using ThreadPoolExecutor.

Here's a profile of mosaic_tiler running for a mercator tile with four assets. Each asset takes a total of 2.5 seconds to load, with ~1.8 seconds of that just in vrt.read here.

image

Return minified json from app endpoints

Several app endpoints return JSON. It might be nice to return minified JSON, for a small improvement in network bandwidth. It consists of literally just replacing:

json.dumps(dict)

with

json.dumps(dict, separators=(',', ':'))

Performance advice?

Hey again,

I'm working on figuring out how to profile the lambda function, but I wanted to also ask if you had suggestions for improving performance, e.g. fastest image file format, mosaicJSON setup, post processing options? I'm getting averages of 12-15 seconds for requests to NAIP imagery (using first pixel selection method), and I'd love to see if I can bring that down a bit.

A big performance boost seems to come from removing @2x (unsurprisingly). Removing @2x and setting the output format to jpg gives me ~2-3 second response times for the Landsat endpoint (with a pregenerated mosaicJSON), which I'm happy with, though NAIP times are still slower.

HTTP API Gateway

In December, AWS announced HTTP API Gateway endpoints, which appear to be 70% cheaper than the existing REST endpoint type and slightly faster.

I was wondering if there was a specific reason you use the REST endpoint type instead of the newer HTTP type in your serverless.yml?

It looks like you can use HTTP in serverless.yml with the httpApi key. The full spec appears to be here. It looks like the syntax is slightly different than the http key. I tried it out for a couple endpoints successfully, but some internal paths get messed up, like trying to load /docs gives an error:

Failed to load API definition:
undefined /docs/docs/openapi.json

mvtEncoder as optional import

I think including mvtEncoder makes the dependency bundle a bit larger? Because it has to include vtzero and maybe a couple other packages...?

The mvt encoding is probably not used by all users, so having it be an optional import and optional dependency would be great.

You could force install with

pip install cogeo-mosaic-tiler[mvt]

`/info` will not work with DynamoDB

quadkeys = list(mosaic_def["tiles"].keys())
# read layernames from the first file
src_path = mosaic_def["tiles"][quadkeys[0]][0]
with rasterio.open(src_path) as src_dst:
layer_names = _get_layer_names(src_dst)
dtype = src_dst.dtypes[0]
meta = {
"bounds": bounds,
"center": center,
"maxzoom": mosaic_def["maxzoom"],
"minzoom": mosaic_def["minzoom"],
"name": mosaicid if mosaicid else url,
"quadkeys": quadkeys,
"layers": layer_names,
"dtype": dtype,
}
return ("OK", "application/json", json.dumps(meta))

@kylebarron, as for the overview cli we are fetching the quadkey list to get info from the COG (dtype and band names). I think for now I'll just add a test so user can pass a quadkey if they try to use dynamoDB backend.

Maybe we could add dtype and layers in the specification ๐Ÿค”

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.