Giter Site home page Giter Site logo

sat-stac's People

Contributors

dylnclrk avatar dzanaga avatar jamesoconnor avatar matthewhanson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sat-stac's Issues

Make links and assets convenience functions more consistent

Currently there is a links() function that takes in an option rel keyword in which case it will return a list of links matching that rel type. In either case it returns a list of only the href's of the links.

There is an assets property which returns the entire assets dictionary

There is an asset() function which takes in a key and returns the entire asset item, not just the href.

This should be made to be more consistent. Specifically the links() function should at least return the whole items, not just the href, and perhaps another function should be added to get just the links.

Create initial code for creating landsat and sentinel catalogs

sat-stac is to be a Python library for creating STAC flat catalogs by crawling s3 buckets, and may also contain other utility functions for working with, manipulating, and/or updating STAC catalogs.

The initial version should support crawling an arbitrary bucket with user provided "transform" functions read contents of a directory and transform it into a STAC metadata record, then write the output.

`satstac.Items` does not exist

I'd like to search a local STAC. Following Tutorial 2, I run something like:

from satstac import Items

items = Items(item_list, collection_list, search=search_args)

Returning the error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/tmp/ipykernel_388/3528807615.py in <module>
----> 1 from satstac import Items
      2 
      3 items = Items(item_list, collection_list, search=search_args)

ImportError: cannot import name 'Items' from 'satstac' (/srv/conda/envs/notebook/lib/python3.8/site-packages/satstac/__init__.py)

Implement storage profiles

There is an open issue for STAC on storage profiles.

sat-stac currently just supports AWS access over https using signed urls.

Based on how storage profiles proceed for STAC 0.7.0, implement to allow sat-stac to be used across multiple cloud providers.

update for STAC 0.7.0

Update to be compliant with STAC 0.7.0

  • self links not required: this means that they should not be added unless a user wants to use the publish function which should update the self links. (might want to change name of publish and be clear it is not required). self links should not be used if the provider expects the location of the catalog to change
  • follow all best practices
  • collection field moved from properties to top level

Incorrect Windows paths while downloading

When trying to pass absolute windows path to item.download() like this one: C:/custom/path
it becomes to C_colon_/custom/path due to item.substitute() calls replace(':', '_colon_') which works fine on path templates but full path is getting wrong.

Cannot import Items from satstac

When I tried to the following line:
from satstac import Items

I got:
ImportError: cannot import name 'Items' from 'satstac' (/srv/conda/envs/notebook/lib/python3.7/site-packages/satstac/init.py)

Why and how to fix it?

ModuleNotFoundError: No module named 'satstac'

Hello,
I am trying to package python-sat-stac to openSUSE Tumbleweed.

The python modules are pulled from openSUSE repository.

Python 3.8.4
pytest-5.4.3
python3-pytest-cov-2.10.0
python3-codecov-2.1.7
python3-requests-2.23.0
python3-python-dateutil-2.8.1

Many errors about ModuleNotFoundError: No module named 'satstac'
What could this be cause by?

Thanks.

Full log here: https://build.opensuse.org/package/live_build_log/home:andythe_great/python-sat-stac/openSUSE_Tumbleweed/x86_64

[   53s] ============================= test session starts ==============================
[   53s] platform linux -- Python 3.8.4, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
[   53s] rootdir: /home/abuild/rpmbuild/BUILD/sat-stac-0.4.0
[   53s] plugins: cov-2.10.0
[   54s] collected 0 items / 7 errors
[   54s] 
[   54s] ==================================== ERRORS ====================================
[   54s] ____________________ ERROR collecting test/test_catalog.py _____________________
[   54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_catalog.py'.
[   54s] Hint: make sure your test modules/packages have valid Python names.
[   54s] Traceback:
[   54s] test/test_catalog.py:6: in <module>
[   54s]     from satstac import __version__, Catalog, STACError, Item
[   54s] E   ModuleNotFoundError: No module named 'satstac'
[   54s] ______________________ ERROR collecting test/test_cli.py _______________________
[   54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_cli.py'.
[   54s] Hint: make sure your test modules/packages have valid Python names.
[   54s] Traceback:
[   54s] test/test_cli.py:9: in <module>
[   54s]     from satstac.cli import parse_args, cli
[   54s] E   ModuleNotFoundError: No module named 'satstac'
[   54s] ___________________ ERROR collecting test/test_collection.py ___________________
[   54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_collection.py'.
[   54s] Hint: make sure your test modules/packages have valid Python names.
[   54s] Traceback:
[   54s] test/test_collection.py:6: in <module>
[   54s]     from satstac import __version__, STACError, Catalog, Collection, Item
[   54s] E   ModuleNotFoundError: No module named 'satstac'
[   54s] ______________________ ERROR collecting test/test_item.py ______________________
[   54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_item.py'.
[   54s] Hint: make sure your test modules/packages have valid Python names.
[   54s] Traceback:
[   54s] test/test_item.py:7: in <module>
[   54s]     from satstac import Item
[   54s] E   ModuleNotFoundError: No module named 'satstac'
[   54s] _________________ ERROR collecting test/test_itemcollection.py _________________
[   54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_itemcollection.py'.
[   54s] Hint: make sure your test modules/packages have valid Python names.
[   54s] Traceback:
[   54s] test/test_itemcollection.py:4: in <module>
[   54s]     from satstac import ItemCollection, Item
[   54s] E   ModuleNotFoundError: No module named 'satstac'
[   54s] _____________________ ERROR collecting test/test_thing.py ______________________
[   54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_thing.py'.
[   54s] Hint: make sure your test modules/packages have valid Python names.
[   54s] Traceback:
[   54s] test/test_thing.py:5: in <module>
[   54s]     from satstac import Thing, STACError
[   54s] E   ModuleNotFoundError: No module named 'satstac'
[   54s] _____________________ ERROR collecting test/test_utils.py ______________________
[   54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_utils.py'.
[   54s] Hint: make sure your test modules/packages have valid Python names.
[   54s] Traceback:
[   54s] test/test_utils.py:5: in <module>
[   54s]     from satstac import utils
[   54s] E   ModuleNotFoundError: No module named 'satstac'
[   54s] =========================== short test summary info ============================
[   54s] ERROR test/test_catalog.py
[   54s] ERROR test/test_cli.py
[   54s] ERROR test/test_collection.py
[   54s] ERROR test/test_item.py
[   54s] ERROR test/test_itemcollection.py
[   54s] ERROR test/test_thing.py
[   54s] ERROR test/test_utils.py
[   54s] !!!!!!!!!!!!!!!!!!! Interrupted: 7 errors during collection !!!!!!!!!!!!!!!!!!!!
[   54s] ============================== 7 errors in 0.42s ===============================

sentinel2 asset errors

Sentinel2 assets are sometimes incorrect. For example..

from satstac import Catalog, Collection, Item

cat = Catalog.open('https://sentinel-stac.s3.amazonaws.com/catalog.json')

col = Collection.open('https://sentinel-stac.s3.amazonaws.com/sentinel-2-l1c/catalog.json')

item = Item.open('https://sentinel-stac.s3.amazonaws.com/sentinel-2-l1c/15/S/UB/2018-04-17/S2B_15SUB_20180417_0.json')

for k in item.assets:
    print(k, item.assets[k])

will print..

thumbnail {'title': 'Thumbnail', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/15/S/UB/2018/4/17/0/preview.jpg'}
info {'title': 'Basic JSON metadata', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/15/S/UB/2018/4/17/0/tileInfo.json'}
metadata {'title': 'Complete XML metadata', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/15/S/UB/2018/4/17/0/metadata.xml'}
tki {'title': 'True color image', 'type': 'image/jp2', 'eo:bands': [3, 2, 1], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/TKI.jp2'}
B01 {'title': 'Band 1 (coastal)', 'type': 'image/jp2', 'eo:bands': [0], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B01.jp2'}
B02 {'title': 'Band 2 (blue)', 'type': 'image/jp2', 'eo:bands': [2], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B02.jp2'}
B03 {'title': 'Band 3 (green)', 'type': 'image/jp2', 'eo:bands': [2], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B03.jp2'}
B04 {'title': 'Band 4 (red)', 'type': 'image/jp2', 'eo:bands': [3], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B04.jp2'}
B05 {'title': 'Band 5', 'type': 'image/jp2', 'eo:bands': [4], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B05.jp2'}
B06 {'title': 'Band 6', 'type': 'image/jp2', 'eo:bands': [5], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B06.jp2'}
B07 {'title': 'Band 7', 'type': 'image/jp2', 'eo:bands': [6], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B07.jp2'}
B08 {'title': 'Band 8 (nir)', 'type': 'image/jp2', 'eo:bands': [7], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B08.jp2'}
B8A {'title': 'Band 8A', 'type': 'image/jp2', 'eo:bands': [8], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B08.jp2'}
B09 {'title': 'Band 9', 'type': 'image/jp2', 'eo:bands': [9], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B09.jp2'}
B10 {'title': 'Band 10 (cirrus)', 'type': 'image/jp2', 'eo:bands': [10], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B10.jp2'}
B11 {'title': 'Band 11 (swir16)', 'type': 'image/jp2', 'eo:bands': [11], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B11.jp2'}
B12 {'title': 'Band 12 (swir22)', 'type': 'image/jp2', 'eo:bands': [12], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B11.jp2'}

which has the following mistakes..

  • B02 (blue) should have eo:bands[1] (not [2])
  • B8A should have a href ending in B8A.jp2 (not B08.jp2)
  • B12 (swir22) should have a href ending in B12.jp2 (not B11.jp2)

AWS accounts

I talked to Joe Flasher here at FOSS4G about this and we decided the best thing to do here is:

  • Use the existing "modis-pds" account to create buckets used to store metadata for Landsat-8 and Sentinel-2. This account is just paid directly by AWS, but should not be used initially for any processing, just storage.

  • Use our AWS account Jamey is currently using to do the initial metadata processing and get a baseline estimate of costs. We can then let Joe know the cost estimate and if they decide to absorb those costs we can move the processing to the "modis-pds" account.

cc @scisco

Reading STAC Items from dict

When satstac.Item loads a dictionary it is unable to access any collection-level information. This is because both satstac.Item._collection and satstac.Item.filename are set to None which casues satstac.Item.collection() to always return None. See the code snippet below.

from satstac import Item, Collection
import requests

infile = "https://landsat-stac.s3.amazonaws.com/landsat-8-l1/026/038/2014-10-30/LC80260382014303LGN00.json"

# Opening item from url (this works)
item_from_url = Item.open(infile)
assert not item_from_url._collection
assert item_from_url.filename == infile
assert type(item_from_url.collection()) == Collection
print(item_from_url.eobands)

# Opening item from dict (this doesn't work)
data = requests.get(infile).json()
item_from_dict = Item(data)
# Both item._collection and item.filename are None which makes item.collection() always return None
assert not item_from_dict._collection
assert not item_from_dict.filename
assert not item_from_dict.collection()

# This prevents accessing collection-level information
print(item_from_dict.eobands)

This prints:

> [{'name': 'B1', 'common_name': 'coastal', 'gsd': 30, 'center_wavelength': 0.44 .........
> []

My personal use-case with this is for stac-updater. Stac-updater uses the STAC Item itself as the message between AWS resources and I'd like to be able to read that message with sat-stac and parse out collection-level information (in this instance eobands is stored at the collection level as commons). For example:

from satstac import Item

def handler(event, context):
    item = Item(event)
    print(item.eobands)

Fix signed URLs to allow for PUT

Signed URLs work for accessing requestor pays files on AWS over https works, however the signed url does not work for PUTing files, such as in the case of updating a STAC catalog file on s3 over https.

Allow Items to not have Collections

While initializing an Items instance has collections as an optional array:
https://github.com/sat-utils/sat-stac/blob/master/satstac/items.py#L11

The last lines in init assume that all of the member items 1) belong to a collection and 2) the collection is provided

This should be optional, in some cases (such as derived Items that are locally generated) they may not have a collection. A user may generate the items then add them later to a collection to be published.

Add optional metadata to Items

Sometimes it's useful to store metadata for a collection of Items.
The Items class stores the FeatureCollection of Items, a list of Collections, and an optional dictionary containing search terms.

Suggest adding a dictionary called metadata so users can add other info as needed.

KeyError: 'collections' opening itemCollection

sat-stac version '0.4.0'

from satstac import ItemCollection
limit=500
url = f'https://cmr.earthdata.nasa.gov/cmr-stac/NSIDC_ECS/collections/C1262010979-NSIDC_ECS/items?limit={limit}'
items = ItemCollection.open(url)
print(len(items))
print(items[0], items[1])
~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/satstac/itemcollection.py in open(cls, filename)
---> 60         collections = [Collection(col) for col in data['collections']]
     61         items = [Item(feature) for feature in data['features']]
     62         collections = [item.collection() for item in items]

KeyError: 'collections'

Can be fixed commenting these lines:

sat-stac/satstac/item.py

Lines 31 to 33 in 42b6743

if self.filename is None:
# TODO - raise exception ?
return None

And changing this:

collections = [Collection(col) for col in data['collections']]
items = [Item(feature) for feature in data['features']]

to collections = [item.collection() for item in items]

But maybe this is an issue with the catalog rather than sat-stac? cc @matthewhanson

Release sat-stac

Hello @matthewhanson,

Thank you for this amazing library! When I do 'pip install sat-stac' (or even 'sat-search'), I encounter dependency issues with 'python-dateutil'. I noticed that in one of the latest commits, you changed '~=' to '>=' which would indeed fix my problem.

However, this change has not yet been updated in the release, and I don't get this update during the pip-install.

My question is, do you have plans to release a new version with this modification? If so, when do you expect to do it?

Thank you very much for your response,

Floriane

Type hints/stubs

It would be great to have type hints for this library, to help developers make sense of inputs and outputs quickly and easily.

path_template

When I tried the following cell (#15) in tutorial1 notebook:

path = '${landsat:path}/${landsat:row}'
filename = '${date}/${id}.json'

collection.add_item(item, path_template=path, filename_template=filename)
print('Item filename: ', item.filename)

print('\nItem links')
pp.pprint(item._data['links'])

I got:
TypeError: add_item() got an unexpected keyword argument 'path_template'

What went wrong and how can I fix it?

STAC workflow

@scisco and I had talked about the general workflow, which Joe Flasher and I talked about more today.

  • Initial Ingestion: The initial ingestion will work by getting the bucket contents and for each "directory" of files read in metadata file(s) and transform that data into a STAC record and write it as a STAC node in a catalog.

  • Updates via SNS: Regular updates will be done by subscribing to the SNS messages that get generated when new files are added (for Landsat this is generated when the final index.html file is added, for Sentinel I believe it is the tileInfo.json file). This will then read in the metadata file(s), transform the metadata and write it as a STAC node in the catalog.

  • Daily reconciliation: We will use AWS's s3 inventory management feature which generates a list of a buckets contents on a daily basis (this is more efficient then crawling the bucket ourselves). A reconciliation process will ensure any missing records get added, as well as delete any records that no longer exist (e.g., real-time scenes that get deleted after the normal scene is available).

  • Generate SNS STAC record: When a new STAC node is created in the catalog and written to disk this will create a SNS message that contains the entire STAC item. This is a better SNS topic to subscribe to for most cases (such as by sat-api) as it already contains a STAC record.

Override STAC_VERSION?

STAC_VERSION='0.6.0'
specifies the STAC version that sat-stac is compatible with. However, in some cases, users may wish to override this, e.g. when a minor STAC version has been released that sat-stac generates valid output for but does not yet explicitly support.

allow specifying of catalog save path

When adding an Item the user can specify a pattern for path and filename, however when adding a catalog or collection it simply uses the ID of the new entity as the directory. This should be able to be specified.

[Enhancement] Add ability to get a specific child from a catalog or collection

I use sat-stac to parse and discover data from TROPOMI sensor on S3. To get a particular child from a Catalog or Collection, the current workflow is to navigate through the results of the generator. This process takes a lot of time for my dataset. Instead, if I were to have a Catalog.get_child(id='value') API (similar to in pystac), I could get to my known child much quicker. See below for some time profiling:

>>> from satstac import Catalog, Collection, Item
>>> coll = Collection.open('https://meeo-s5p.s3.amazonaws.com/catalog.json')
>>> %time coll_children = list(coll.children())
>>> print(coll_children)
# CPU times: user 70.8 ms, sys: 5.5 ms, total: 76.3 ms
# Wall time: 2.97 s
# [meeo-s5p-cog, NRTI, OFFL, RPRO]

>>> offl = coll_children[2]
>>> offl.links()
['https://meeo-s5p.s3.amazonaws.com/OFFL/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__AER_AI/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CH4___/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CLOUD_/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CO____/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__HCHO__/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__NO2___/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__O3____/catalog.json',
 'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__SO2___/catalog.json']

>>> %time offl_no2 = list(offl.children())[-3]
>>> offl_no2
# CPU times: user 144 ms, sys: 12.7 ms, total: 157 ms
# Wall time: 6.72 s

# L2__NO2___

>>> %time offl_no2_direct = Catalog.open(offl.links()[-3])
# CPU times: user 19.6 ms, sys: 3.33 ms, total: 22.9 ms
# Wall time: 831 ms

If I were to open a child directly with get_child(), I could hypothetically get it in under a second, compared to 6.7s in the current workflow.

Tutorial on Python classes

Finish tutorial-2 notebook to illustrate how to use the STAC Python classes: Catalog, Collection, Item

Out of band writing

I would like to handle writing Catalog and Item JSON representations. 2 (or more?) things would facilitate this:

  1. a helper for generating JSON representations (effectively json.dumps(self.data))
  2. eliminating the check for saved catalogs prior to adding sub-catalogs

Is there a reason that catalog hierarchies can't be assembled in memory before being written out?

Add option for requestor pays

Requestor pays is supported in the creation of the signed URL, but there's no way to specify it when downloading.

Explicitly requiring user to set request_pays=True ensures they know they are paying egress costs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.