sat-utils / sat-stac Goto Github PK
View Code? Open in Web Editor NEWPython library for creating and working with STAC catalogs
License: MIT License
Python library for creating and working with STAC catalogs
License: MIT License
Currently there is a links() function that takes in an option rel
keyword in which case it will return a list of links matching that rel
type. In either case it returns a list of only the href's of the links.
There is an assets
property which returns the entire assets dictionary
There is an asset()
function which takes in a key and returns the entire asset item, not just the href.
This should be made to be more consistent. Specifically the links()
function should at least return the whole items, not just the href, and perhaps another function should be added to get just the links.
sat-stac is to be a Python library for creating STAC flat catalogs by crawling s3 buckets, and may also contain other utility functions for working with, manipulating, and/or updating STAC catalogs.
The initial version should support crawling an arbitrary bucket with user provided "transform" functions read contents of a directory and transform it into a STAC metadata record, then write the output.
I'd like to search a local STAC. Following Tutorial 2, I run something like:
from satstac import Items
items = Items(item_list, collection_list, search=search_args)
Returning the error:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/tmp/ipykernel_388/3528807615.py in <module>
----> 1 from satstac import Items
2
3 items = Items(item_list, collection_list, search=search_args)
ImportError: cannot import name 'Items' from 'satstac' (/srv/conda/envs/notebook/lib/python3.8/site-packages/satstac/__init__.py)
Does it can be used for for Sentinel3-OLCI AWS collection??
There is an open issue for STAC on storage profiles.
sat-stac currently just supports AWS access over https using signed urls.
Based on how storage profiles proceed for STAC 0.7.0, implement to allow sat-stac to be used across multiple cloud providers.
Currently catalogs and collections are all called 'catalog.json'. User should be able to change default behavior, and default for collections is that they be named 'collection.json' rather than 'catalog.json'
Update to be compliant with STAC 0.7.0
When trying to pass absolute windows path to item.download()
like this one: C:/custom/path
it becomes to C_colon_/custom/path
due to item.substitute()
calls replace(':', '_colon_')
which works fine on path templates but full path is getting wrong.
When I tried to the following line:
from satstac import Items
I got:
ImportError: cannot import name 'Items' from 'satstac' (/srv/conda/envs/notebook/lib/python3.7/site-packages/satstac/init.py)
Why and how to fix it?
Hello,
I am trying to package python-sat-stac to openSUSE Tumbleweed.
The python modules are pulled from openSUSE repository.
Python 3.8.4
pytest-5.4.3
python3-pytest-cov-2.10.0
python3-codecov-2.1.7
python3-requests-2.23.0
python3-python-dateutil-2.8.1
Many errors about ModuleNotFoundError: No module named 'satstac'
What could this be cause by?
Thanks.
Full log here: https://build.opensuse.org/package/live_build_log/home:andythe_great/python-sat-stac/openSUSE_Tumbleweed/x86_64
[ 53s] ============================= test session starts ==============================
[ 53s] platform linux -- Python 3.8.4, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
[ 53s] rootdir: /home/abuild/rpmbuild/BUILD/sat-stac-0.4.0
[ 53s] plugins: cov-2.10.0
[ 54s] collected 0 items / 7 errors
[ 54s]
[ 54s] ==================================== ERRORS ====================================
[ 54s] ____________________ ERROR collecting test/test_catalog.py _____________________
[ 54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_catalog.py'.
[ 54s] Hint: make sure your test modules/packages have valid Python names.
[ 54s] Traceback:
[ 54s] test/test_catalog.py:6: in <module>
[ 54s] from satstac import __version__, Catalog, STACError, Item
[ 54s] E ModuleNotFoundError: No module named 'satstac'
[ 54s] ______________________ ERROR collecting test/test_cli.py _______________________
[ 54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_cli.py'.
[ 54s] Hint: make sure your test modules/packages have valid Python names.
[ 54s] Traceback:
[ 54s] test/test_cli.py:9: in <module>
[ 54s] from satstac.cli import parse_args, cli
[ 54s] E ModuleNotFoundError: No module named 'satstac'
[ 54s] ___________________ ERROR collecting test/test_collection.py ___________________
[ 54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_collection.py'.
[ 54s] Hint: make sure your test modules/packages have valid Python names.
[ 54s] Traceback:
[ 54s] test/test_collection.py:6: in <module>
[ 54s] from satstac import __version__, STACError, Catalog, Collection, Item
[ 54s] E ModuleNotFoundError: No module named 'satstac'
[ 54s] ______________________ ERROR collecting test/test_item.py ______________________
[ 54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_item.py'.
[ 54s] Hint: make sure your test modules/packages have valid Python names.
[ 54s] Traceback:
[ 54s] test/test_item.py:7: in <module>
[ 54s] from satstac import Item
[ 54s] E ModuleNotFoundError: No module named 'satstac'
[ 54s] _________________ ERROR collecting test/test_itemcollection.py _________________
[ 54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_itemcollection.py'.
[ 54s] Hint: make sure your test modules/packages have valid Python names.
[ 54s] Traceback:
[ 54s] test/test_itemcollection.py:4: in <module>
[ 54s] from satstac import ItemCollection, Item
[ 54s] E ModuleNotFoundError: No module named 'satstac'
[ 54s] _____________________ ERROR collecting test/test_thing.py ______________________
[ 54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_thing.py'.
[ 54s] Hint: make sure your test modules/packages have valid Python names.
[ 54s] Traceback:
[ 54s] test/test_thing.py:5: in <module>
[ 54s] from satstac import Thing, STACError
[ 54s] E ModuleNotFoundError: No module named 'satstac'
[ 54s] _____________________ ERROR collecting test/test_utils.py ______________________
[ 54s] ImportError while importing test module '/home/abuild/rpmbuild/BUILD/sat-stac-0.4.0/test/test_utils.py'.
[ 54s] Hint: make sure your test modules/packages have valid Python names.
[ 54s] Traceback:
[ 54s] test/test_utils.py:5: in <module>
[ 54s] from satstac import utils
[ 54s] E ModuleNotFoundError: No module named 'satstac'
[ 54s] =========================== short test summary info ============================
[ 54s] ERROR test/test_catalog.py
[ 54s] ERROR test/test_cli.py
[ 54s] ERROR test/test_collection.py
[ 54s] ERROR test/test_item.py
[ 54s] ERROR test/test_itemcollection.py
[ 54s] ERROR test/test_thing.py
[ 54s] ERROR test/test_utils.py
[ 54s] !!!!!!!!!!!!!!!!!!! Interrupted: 7 errors during collection !!!!!!!!!!!!!!!!!!!!
[ 54s] ============================== 7 errors in 0.42s ===============================
Sentinel2 assets are sometimes incorrect. For example..
from satstac import Catalog, Collection, Item
cat = Catalog.open('https://sentinel-stac.s3.amazonaws.com/catalog.json')
col = Collection.open('https://sentinel-stac.s3.amazonaws.com/sentinel-2-l1c/catalog.json')
item = Item.open('https://sentinel-stac.s3.amazonaws.com/sentinel-2-l1c/15/S/UB/2018-04-17/S2B_15SUB_20180417_0.json')
for k in item.assets:
print(k, item.assets[k])
will print..
thumbnail {'title': 'Thumbnail', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/15/S/UB/2018/4/17/0/preview.jpg'}
info {'title': 'Basic JSON metadata', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/15/S/UB/2018/4/17/0/tileInfo.json'}
metadata {'title': 'Complete XML metadata', 'href': 'https://roda.sentinel-hub.com/sentinel-s2-l1c/tiles/15/S/UB/2018/4/17/0/metadata.xml'}
tki {'title': 'True color image', 'type': 'image/jp2', 'eo:bands': [3, 2, 1], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/TKI.jp2'}
B01 {'title': 'Band 1 (coastal)', 'type': 'image/jp2', 'eo:bands': [0], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B01.jp2'}
B02 {'title': 'Band 2 (blue)', 'type': 'image/jp2', 'eo:bands': [2], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B02.jp2'}
B03 {'title': 'Band 3 (green)', 'type': 'image/jp2', 'eo:bands': [2], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B03.jp2'}
B04 {'title': 'Band 4 (red)', 'type': 'image/jp2', 'eo:bands': [3], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B04.jp2'}
B05 {'title': 'Band 5', 'type': 'image/jp2', 'eo:bands': [4], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B05.jp2'}
B06 {'title': 'Band 6', 'type': 'image/jp2', 'eo:bands': [5], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B06.jp2'}
B07 {'title': 'Band 7', 'type': 'image/jp2', 'eo:bands': [6], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B07.jp2'}
B08 {'title': 'Band 8 (nir)', 'type': 'image/jp2', 'eo:bands': [7], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B08.jp2'}
B8A {'title': 'Band 8A', 'type': 'image/jp2', 'eo:bands': [8], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B08.jp2'}
B09 {'title': 'Band 9', 'type': 'image/jp2', 'eo:bands': [9], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B09.jp2'}
B10 {'title': 'Band 10 (cirrus)', 'type': 'image/jp2', 'eo:bands': [10], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B10.jp2'}
B11 {'title': 'Band 11 (swir16)', 'type': 'image/jp2', 'eo:bands': [11], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B11.jp2'}
B12 {'title': 'Band 12 (swir22)', 'type': 'image/jp2', 'eo:bands': [12], 'href': 'https://sentinel-s2-l1c.s3.amazonaws.com/tiles/15/S/UB/2018/4/17/0/B11.jp2'}
which has the following mistakes..
Change the ItemCollection.load() method to be open() for consistency with other satstac objects, and also allow for reading from remote https sources.
I talked to Joe Flasher here at FOSS4G about this and we decided the best thing to do here is:
Use the existing "modis-pds" account to create buckets used to store metadata for Landsat-8 and Sentinel-2. This account is just paid directly by AWS, but should not be used initially for any processing, just storage.
Use our AWS account Jamey is currently using to do the initial metadata processing and get a baseline estimate of costs. We can then let Joe know the cost estimate and if they decide to absorb those costs we can move the processing to the "modis-pds" account.
cc @scisco
Add ability to download all assets from an Item, or Items, without having to specify all the keys for it
When satstac.Item
loads a dictionary it is unable to access any collection-level information. This is because both satstac.Item._collection
and satstac.Item.filename
are set to None
which casues satstac.Item.collection()
to always return None
. See the code snippet below.
from satstac import Item, Collection
import requests
infile = "https://landsat-stac.s3.amazonaws.com/landsat-8-l1/026/038/2014-10-30/LC80260382014303LGN00.json"
# Opening item from url (this works)
item_from_url = Item.open(infile)
assert not item_from_url._collection
assert item_from_url.filename == infile
assert type(item_from_url.collection()) == Collection
print(item_from_url.eobands)
# Opening item from dict (this doesn't work)
data = requests.get(infile).json()
item_from_dict = Item(data)
# Both item._collection and item.filename are None which makes item.collection() always return None
assert not item_from_dict._collection
assert not item_from_dict.filename
assert not item_from_dict.collection()
# This prevents accessing collection-level information
print(item_from_dict.eobands)
This prints:
> [{'name': 'B1', 'common_name': 'coastal', 'gsd': 30, 'center_wavelength': 0.44 .........
> []
My personal use-case with this is for stac-updater. Stac-updater uses the STAC Item itself as the message between AWS resources and I'd like to be able to read that message with sat-stac and parse out collection-level information (in this instance eobands is stored at the collection level as commons). For example:
from satstac import Item
def handler(event, context):
item = Item(event)
print(item.eobands)
In a "self-contained" catalog with relative links, the links
are interpreted correctly as relative, but the assets
aren't. It appears they need the same check that links
gets.
Signed URLs work for accessing requestor pays files on AWS over https works, however the signed url does not work for PUTing files, such as in the case of updating a STAC catalog file on s3 over https.
While initializing an Items instance has collections as an optional array:
https://github.com/sat-utils/sat-stac/blob/master/satstac/items.py#L11
The last lines in init assume that all of the member items 1) belong to a collection and 2) the collection is provided
This should be optional, in some cases (such as derived Items that are locally generated) they may not have a collection. A user may generate the items then add them later to a collection to be published.
Sometimes it's useful to store metadata for a collection of Items.
The Items class stores the FeatureCollection of Items, a list of Collections, and an optional dictionary containing search terms.
Suggest adding a dictionary called metadata so users can add other info as needed.
sat-stac version '0.4.0'
from satstac import ItemCollection
limit=500
url = f'https://cmr.earthdata.nasa.gov/cmr-stac/NSIDC_ECS/collections/C1262010979-NSIDC_ECS/items?limit={limit}'
items = ItemCollection.open(url)
print(len(items))
print(items[0], items[1])
~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/satstac/itemcollection.py in open(cls, filename)
---> 60 collections = [Collection(col) for col in data['collections']]
61 items = [Item(feature) for feature in data['features']]
62 collections = [item.collection() for item in items]
KeyError: 'collections'
Can be fixed commenting these lines:
Lines 31 to 33 in 42b6743
And changing this:
sat-stac/satstac/itemcollection.py
Lines 58 to 59 in 42b6743
collections = [item.collection() for item in items]
But maybe this is an issue with the catalog rather than sat-stac? cc @matthewhanson
Hello @matthewhanson,
Thank you for this amazing library! When I do 'pip install sat-stac' (or even 'sat-search'), I encounter dependency issues with 'python-dateutil'. I noticed that in one of the latest commits, you changed '~=' to '>=' which would indeed fix my problem.
However, this change has not yet been updated in the release, and I don't get this update during the pip-install.
My question is, do you have plans to release a new version with this modification? If so, when do you expect to do it?
Thank you very much for your response,
Floriane
It would be great to have type hints for this library, to help developers make sense of inputs and outputs quickly and easily.
The sat-stac Items class is used for saving searches. sat-search needs to be updated to use the Python classes here, which may require changes to items.py.
@matthewhanson - not sure if this is the right repo for the element84 earth search server, but see radiantearth/stac-api-spec#47
It looks like it expects /search?bbox=[4,4,5,5] format, while all the rest are /search?bbox=4,4,5,5
When I tried the following cell (#15) in tutorial1 notebook:
path = '${landsat:path}/${landsat:row}'
filename = '${date}/${id}.json'
collection.add_item(item, path_template=path, filename_template=filename)
print('Item filename: ', item.filename)
print('\nItem links')
pp.pprint(item._data['links'])
I got:
TypeError: add_item() got an unexpected keyword argument 'path_template'
What went wrong and how can I fix it?
extension hardcoded to .json, which should be configurable
@scisco and I had talked about the general workflow, which Joe Flasher and I talked about more today.
Initial Ingestion: The initial ingestion will work by getting the bucket contents and for each "directory" of files read in metadata file(s) and transform that data into a STAC record and write it as a STAC node in a catalog.
Updates via SNS: Regular updates will be done by subscribing to the SNS messages that get generated when new files are added (for Landsat this is generated when the final index.html file is added, for Sentinel I believe it is the tileInfo.json file). This will then read in the metadata file(s), transform the metadata and write it as a STAC node in the catalog.
Daily reconciliation: We will use AWS's s3 inventory management feature which generates a list of a buckets contents on a daily basis (this is more efficient then crawling the bucket ourselves). A reconciliation process will ensure any missing records get added, as well as delete any records that no longer exist (e.g., real-time scenes that get deleted after the normal scene is available).
Generate SNS STAC record: When a new STAC node is created in the catalog and written to disk this will create a SNS message that contains the entire STAC item. This is a better SNS topic to subscribe to for most cases (such as by sat-api) as it already contains a STAC record.
Line 8 in 8963f37
When adding an Item the user can specify a pattern for path and filename, however when adding a catalog or collection it simply uses the ID of the new entity as the directory. This should be able to be specified.
I use sat-stac
to parse and discover data from TROPOMI sensor on S3. To get a particular child from a Catalog
or Collection
, the current workflow is to navigate through the results of the generator. This process takes a lot of time for my dataset. Instead, if I were to have a Catalog.get_child(id='value')
API (similar to in pystac
), I could get to my known child much quicker. See below for some time profiling:
>>> from satstac import Catalog, Collection, Item
>>> coll = Collection.open('https://meeo-s5p.s3.amazonaws.com/catalog.json')
>>> %time coll_children = list(coll.children())
>>> print(coll_children)
# CPU times: user 70.8 ms, sys: 5.5 ms, total: 76.3 ms
# Wall time: 2.97 s
# [meeo-s5p-cog, NRTI, OFFL, RPRO]
>>> offl = coll_children[2]
>>> offl.links()
['https://meeo-s5p.s3.amazonaws.com/OFFL/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__AER_AI/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CH4___/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CLOUD_/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__CO____/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__HCHO__/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__NO2___/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__O3____/catalog.json',
'https://meeo-s5p.s3.amazonaws.com/OFFL/L2__SO2___/catalog.json']
>>> %time offl_no2 = list(offl.children())[-3]
>>> offl_no2
# CPU times: user 144 ms, sys: 12.7 ms, total: 157 ms
# Wall time: 6.72 s
# L2__NO2___
>>> %time offl_no2_direct = Catalog.open(offl.links()[-3])
# CPU times: user 19.6 ms, sys: 3.33 ms, total: 22.9 ms
# Wall time: 831 ms
If I were to open a child directly with get_child()
, I could hypothetically get it in under a second, compared to 6.7s in the current workflow.
Finish tutorial-2 notebook to illustrate how to use the STAC Python classes: Catalog, Collection, Item
I would like to handle writing Catalog and Item JSON representations. 2 (or more?) things would facilitate this:
json.dumps(self.data)
)Is there a reason that catalog hierarchies can't be assembled in memory before being written out?
Requestor pays is supported in the creation of the signed URL, but there's no way to specify it when downloading.
Explicitly requiring user to set request_pays=True ensures they know they are paying egress costs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.