pypsa-meets-earth / earth-osm Goto Github PK
View Code? Open in Web Editor NEWPython tool to extract large-amounts of OpenStreetMap data
Home Page: https://pypsa-meets-earth.github.io/earth-osm/
License: MIT License
Python tool to extract large-amounts of OpenStreetMap data
Home Page: https://pypsa-meets-earth.github.io/earth-osm/
License: MIT License
Detailed description can be found here. Below I am posting the content.
Conda-forge is by far my favorite place to publish python packages. Before a package can be published on conda-forge it must go through a brief review process, and all of its dependencies must be already installable via conda-forge. Using conda-forge for all of your packages can save you some serious headaches by avoiding dependency issues that can arise by mixing channels or installers.
Steps to Take Before Submitting to conda-forge
It is ideal that you have written some unit tests for your package, so that they can be run when updating your conda-forge feedstock. You also should have your code documented so users have some idea of how to use it.
Adding Your conda-forge Recipe
Out of all of these steps, the only part that can be a bit confusing is creating your recipe. The file hierarchy you should create in the forked “staged-recipes” repo looks like this:
/recipes
/<packagename>
meta.yaml
Below is the recipe I have used for one of my libraries, SWEpy. The recipe is a meta.yaml file much like the one used for your basic conda package with a little more detail required.
{% set version = "1.9.4" %}
package:
name: {{ name|lower }}
version: {{ version }}
source:
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz
sha256: a0462e693c4ed689be2e9050c1c7a932a5b015817d0ba6709a0231f090bc9b02
build:
noarch: python
number: 1
script: {{ PYTHON }} -m pip install --no-deps --ignore-installed .
requirements:
host:
- python
- pip
run:
- matplotlib-base
- python >=3.6
- pytest
- pandas
- requests
- tqdm
- pynco
- gdal
- scipy
- affine
- netcdf4
- fsspec
- xarray
- mapboxgl
- jenkspy
- zarr
test:
requires:
- pytest
files:
- MEASURES/NSIDC-0630.001/2010.01.01/NSIDC-0630-EASE2_N3.125km-F17_SSMIS-2010001-37H-M-SIR-CSU-v1.3.nc
- MEASURES/NSIDC-0630.001/2010.01.01/NSIDC-0630-EASE2_N6.25km-F17_SSMIS-2010001-19H-M-SIR-CSU-v1.3.nc
- mock_server.py
imports:
- swepy
- swepy.pipeline
- swepy.nsidcDownloader
- swepy.process
- swepy.analysis
about:
home: http://github.com/wino6687/swepy
license: MIT
license_family: MIT
license_file: LICENSE
summary: 'A python package for obtaining and manipulating Tb files from the MEaSUREs database'
description: |
SWEpy is a Python library designed to simplify access to a passive microwave brightness
temperature dataset available at the National Snow and Ice Data Center (NSIDC).
This dataset contains Northern and Southern hemisphere imagery along with Equatorial
imagery, and can be quite useful in analyzing snow water equivalent (SWE) over large spatial
extents. SWEpy contains tools to web scrape, geographically subset, and concatenate files into
time cubes. There is an automated workflow to scrape long time series while periodically stopping
to geographically subset and concatenate files in order to reduce disk impact.
dev_url: https://github.com/wino6687/SWEpy
extra:
recipe-maintainers:
- wino6687
Once you submit your pull request, someone from conda-forge will comment if there are any changes that need to be made before it is merged into it’s own feedstock. After it has been merged into its own feedstock, it will be built and then deployed so that it can be downloaded via the conda-forge channel!
You now know how to publish your Python library on conda and conda-forge! But there is nothing more annoying than manually maintaining your packages in multiple places.
I solved this issue by setting up continuous integration with Travis-CI that automatically deploys any new versions of my code to PyPI, and then deploys new versions from PyPI to my conda-forge feedstock. [Max: we use github actions instead]
You can read my other post about how I setup my workflow to make Python libraries more robust. Continuous integration is your best friend when it comes to updating your libraries; Travis will run your unit tests, and if they pass, deploy them to keep your packages up to date. This way you never have to bother with manually uploading new versions again.
Describe the bug
eo.save_osm_data() always creates .csv (default) output even if only "geojson" is selected/needed. This creates overhead, especially if only one particular file type is needed.
To Reproduce
Steps to reproduce the behavior:
Run:
from earth_osm import eo
import earth_osm as eo
eo.save_osm_data(
primary_name = 'power',
region_list = ['benin', 'monaco'],
feature_list = ['substation', 'line'],
update = False,
mp = True,
data_dir = './earth_data',
out_format = [ 'geojson'],
out_aggregate = False,
)
Expected behavior
Only output .geojson files
Desktop (please complete the following information):
Currently, everything works smoothly when extracting data from geofabriks.
Do we want to support also filtering out custom pbf extracts or suggest other tools e.g. pyrosm?
Feature request came from: #25 (comment)
The purpose and motivation of the package should be described at the beginning of the readme.
I like that the documentation is in markdown. Atm the documentation is not deployed yet. We could host that on readthedocs for free as described here: https://www.mkdocs.org/user-guide/deploying-your-docs/
Further, we could
partially downloaded/corrupt pbf files cause decoding errors in pbf files.
it also seems like the update flag does not re-download the file
could solve this by implementing:
It would be nice to have the ability to download all buildings regardless of the feature.
Building is implemented, with the given features albeit,
the 'all' wildcard is still wip.
Originally posted by @mnm-matin in #26 (comment)
Is your feature request related to a problem? Please describe.
It would be nice to have the option to specify a dlfolder to keep the data (raw and intermediate), while the output files may be saved into another folder.
This should ease some easy paralleling options.
In the pypsa-earth case, we could specify as intermediate folder the data folder, while as output the resource folder.
When data are already downloaded, this would ease the processing quite a lot: the different processes do not have conflicts.
To add on this, it would be nice to have some parallel-safe operation if two or more processes share the same data dir.
Describe the solution you'd like
An option output_dir, beyond data dir would be nice to have.
The default value may be None and in that case the data dir is used.
This helps parallelization, but it is not completely parallel-safe.
Alternatives are welcome obviously :)
@mnm-matin thanks for the amazing package! Usually it works like a charm :)
There is a little usability suggestion. It appears that if data loading has been incomplete, an error message is thrown which is not very meaningful and can easily terrify a user.
Such an issue has been described here. I have also encountered it when running PyPSA-Earth workflow for Japan. It looked like loading has been started, but after a while this enigmatic OSMPBF.Blob'
error message has appeared (the full listing is bellow).
It has been fixed with manual loading pbf
file, while loading wasn't very fast. So, I assume that the primary reason of the troubles were some issues with the network connections. However, the message made me think first rather about some environment issues than about data ones. I wonder if it would be possible to add a test on loading completeness and add a meaningful error message in case something went wrong. What do you think?
INFO:snakemake.logging:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/extract.py", line 50, in filter_file_block
entries.ParseFromString(read_blob(file, ofs, header))
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/osmpbf/file.py", line 54, in read_blob
blob.ParseFromString(file.read(header.datasize))
google.protobuf.message.DecodeError: Error parsing message with type 'OSMPBF.Blob'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "~/pypsa-earth/.snakemake/scripts/tmp4ub5jw9s.download_osm_data.py", line 112, in <module>
eo.get_osm_data(
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/eo.py", line 101, in get_osm_data
df_feature = process_country(region, primary_name, feature_name, mp, update, data_dir)
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/eo.py", line 38, in process_country
primary_dict, feature_dict = get_filtered_data(region, primary_name, feature_name, mp, update, data_dir)
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/filter.py", line 111, in get_filtered_data
primary_dict = run_primary_filter(PBF_inputfile, primary_file, primary_name, mp)
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/filter.py", line 70, in run_primary_filter
primary_data = filter_pbf(PBF_inputfile, pre_filter, multiprocess)
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/extract.py", line 93, in filter_pbf
primary_entries = list(file_query(primary_entry_filter, pre_filter)) #list of named tuples eg. Node(id,tags, lonlat)
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/site-packages/earth_osm/extract.py", line 66, in query_func
entry_lists = pool.starmap(
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "~/miniconda3/envs/pypsa-earth-upd10/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
google.protobuf.message.DecodeError: Error parsing message with type 'OSMPBF.Blob'
Earth-OSM is currently only limited to Power infrastructure features. This is done on purpose, to get a stable v0 release.
This will remain so until v1 which is scheduled around February 2023. If you require any other features please list them in the comments. cheers!
Found a nice little bug? Fill out the section below. ⬇️
Describe the bug
Value error, see https://github.com/pypsa-meets-earth/pypsa-earth/actions/runs/7306231974/job/19910723923
When running the pypsa-earth workflow in https://github.com/pypsa-meets-earth/pypsa-earth/actions/runs/7306231974/job/19910723923
the following error is triggered:
ValueError: Must have equal len keys and value when setting with an iterable
To Reproduce
Run pypsa-earth tutorial in pypsa-meets-earth/pypsa-earth#943
Expected behavior
The workflow to run
Hi all,
Does anyone have thoughts on how to programmatically break up large files to run them through earth-osm? I am constrained by memory. I am attempting the (almost) impossible and trying to run a planetary pbf for lines, generator, and substation. It would be ideal if you could chunk in the planet.pbf file but I am not sure that's possible.
Thoughts?
Just saw a comment about download accerlators. I think raising this as an enhancement/feature request will help track this performance opportunity. Checking out the commented package pySmartDL, it looks the package is not too active anymore -- maybe there are new alternatives?
The pySmartDL package and the author also refer to a couple of HTTP websites (not HTTPS). Why HTTP is no not secure is written in this article. This can be dangerous and opens the user to a number of attacks:
Anyways, the concept and application of download accelerators can enhance the package. Would be an awesome feature.
Documentation needs to be improved:
Some examples of docs made using MkDocs Material:
Found a nice little bug? Fill out the section below. ⬇️
Describe the bug
In geofabrik, as known, some countries are merged into a non conventional code.
Some, such as Senegal and Gambia are already included, but some ar missing.
An example is malaysia, singapore and brunei. Only malaysia can be found.
This list may not be exhaustive.
With the previous implementation of PyPSA-Earth, we cleaned the config_osm_data and verified for Asia, Africa and South America.
That may be of help hopefully.
I saw that the json is loaded from the website, but there are issues.
For example, saudi arabia, that is merged together with GCC is missing; some manual fixing may be needed, I will be doing that locally
When running earth-osm for panama the following error turns out.
The error message shall be improved as view_region cannot be directly used by the user and the function is view_regions.
Minor issue.
PA not found. check view_region()
I think Matin you mentioned something about the general cleaning part. We need to clarify where this is happening, if there is an interface for more detailed cleaning or if you expect from the user to build its own cleaning interface.
When running earth_osm view regions
, "panama" is available in central-america with flag "panama" but NaN country code.
When using earth_osm using PA, it does not work. When country_code is NaN, it may be filled using country_converter, that would help a lot.
I fear this may occur for other countries as well.
{
"type": "Feature",
"properties": {
"id" : "panama",
"parent" : "central-america",
"name" : "Panama",
"urls" : {
"pbf" : "https://download.geofabrik.de/central-america/panama-latest.osm.pbf",
"bz2" : "https://download.geofabrik.de/central-america/panama-latest.osm.bz2",
"shp" : "https://download.geofabrik.de/central-america/panama-latest-free.shp.zip",
"pbf-internal" : "https://osm-internal.download.geofabrik.de/central-america/panama-latest-internal.osm.pbf",
"history" : "https://osm-internal.download.geofabrik.de/central-america/panama-internal.osh.pbf",
"taginfo" : "https://taginfo.geofabrik.de/central-america/panama/",
"updates" : "https://download.geofabrik.de/central-america/panama-updates"
}
}
},
Describe the bug
When extracting data for Kazhakhstan a lonlat conversion problem arises
With the new version of shapely, lon-lat data shall be passed as tuples, not lists. Forcing the entries to be tuples should be enough.
INFO:osm_data_extractor:Writing GeoJSON file
WARNING:osm_data_extractor:Node dataframe not empty for line in KZ
Traceback (most recent call last):
File "/data/davidef/git_world/pypsa-earth/.snakemake/scripts/tmpiy9bx29f.download_osm_data.py", line 110, in <module>
eo.get_osm_data(
File "/home/davidef/miniconda3/envs/pypsa-earth/lib/python3.8/site-packages/earth_osm/eo.py", line 103, in get_osm_data
output_creation(df_feature, primary_name, feature_name, [region], data_dir, out_format, out_aggregate)
File "/home/davidef/miniconda3/envs/pypsa-earth/lib/python3.8/site-packages/earth_osm/utils.py", line 204, in output_creation
gdf_feature = convert_pd_to_gdf_lines(df_feature)
File "/home/davidef/miniconda3/envs/pypsa-earth/lib/python3.8/site-packages/earth_osm/utils.py", line 132, in convert_pd_to_gdf_lines
df_way, geometry=[LineString(x) for x in df_way.lonlat], crs="EPSG:4326"
File "/home/davidef/miniconda3/envs/pypsa-earth/lib/python3.8/site-packages/earth_osm/utils.py", line 132, in <listcomp>
df_way, geometry=[LineString(x) for x in df_way.lonlat], crs="EPSG:4326"
File "/home/davidef/miniconda3/envs/pypsa-earth/lib/python3.8/site-packages/shapely/geometry/linestring.py", line 73, in __new__
geom = shapely.linestrings(coordinates)
File "/home/davidef/miniconda3/envs/pypsa-earth/lib/python3.8/site-packages/shapely/decorators.py", line 77, in wrapped
return func(*args, **kwargs)
File "/home/davidef/miniconda3/envs/pypsa-earth/lib/python3.8/site-packages/shapely/creation.py", line 120, in linestrings
return lib.linestrings(coords, out=out, **kwargs)
When debugging for Russia (anotherwone) appeared that in some cases the data in osm contain unexpected values.
In particular, in the lines dataseta, a node was improperly stored.
That problem was solved using
df_way = df_way.drop(df_way[df_way.Type != "Way"].index).reset_index(drop=True)
To Reproduce
Steps to reproduce the behavior:
Found a nice little bug? Fill out the section below. ⬇️
Describe the bug
README steps are not working. Seem
To Reproduce
Steps to reproduce the behavior:
conda create --name earth-osm -y
conda activate earth-osm
pip install git+https://github.com/pypsa-meets-earth/earth-osm.git
earth_osm extract power --regions benin monaco --features substation line
Screenshots
earth_osm extract power --regions benin --features line
Traceback (most recent call last):
File "/home/max/.local/bin/earth_osm", line 5, in <module>
from earth_osm.__main__ import main
File "/home/max/.local/lib/python3.10/site-packages/earth_osm/__main__.py", line 3, in <module>
from .args import main # pragma: no cover
File "/home/max/.local/lib/python3.10/site-packages/earth_osm/args.py", line 15, in <module>
from earth_osm.eo import get_osm_data
File "/home/max/.local/lib/python3.10/site-packages/earth_osm/eo.py", line 15, in <module>
from earth_osm.filter import get_filtered_data
File "/home/max/.local/lib/python3.10/site-packages/earth_osm/filter.py", line 13, in <module>
from earth_osm.extract import filter_pbf
File "/home/max/.local/lib/python3.10/site-packages/earth_osm/extract.py", line 13, in <module>
from earth_osm.osmpbf import Node, Way, Relation, osmformat_pb2
File "/home/max/.local/lib/python3.10/site-packages/earth_osm/osmpbf/osmformat_pb2.py", line 5, in <module>
from google.protobuf.internal import builder as _builder
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/usr/lib/python3/dist-packages/google/protobuf/internal/__init__.py)
Desktop (please complete the following information):
It is at the moment unclear how to extend the package to other OSM tags. E.g. Hazem wanted to use this in near future with earth-osm
:
https://wiki.openstreetmap.org/wiki/Oil_and_Gas_Infrastructure
Suggestion:
Let's add some documentation on that?
earth-osm creates 3 outputs and files for each country:
out
pbf
power
For usability, it would be good if the out
is a concat
of tables of the same type e.g. all_lines
The sitemap is download at each execution. We can check if the sitemap exists and update it at a fixed frequency only, otherwise fallback to the original file. Thanks!
Add .git actions to auto deploy after the version tag is changing. Examples:
Is your feature request related to a problem? Please describe.
It is related to the consistency between the data downloaded from openstreetmap
using earth-osm
and the dataset of power lines in PyPSA-Eur
. Here is a visualization of the data for Denmark:
Some of the LineString
s significantly differ from one database to another (blue line shows the PyPSA-Eur
data). I know that the PyPSA-Eur
data comes from ENTSO-E
database that is an "illustration" of the power lines but I can still see some major differences between the two (see e.g. the connectivity of lines located at the North-West of Denmark).
Describe the solution you'd like
Since some of these databases are available, it is useful to have them here for comparison.
Describe alternatives you've considered
Having other databases in the package.
Traceback (most recent call last):
File "D:\a\pypsa-earth\pypsa-earth\.snakemake\scripts\tmpycprlere.download_osm_data.py", line 14, in <module>
from earth_osm import eo
File "C:\Miniconda3\envs\pypsa-earth\lib\site-packages\earth_osm\eo.py", line 16, in <module>
from earth_osm.gfk_data import get_region_tuple
File "C:\Miniconda3\envs\pypsa-earth\lib\site-packages\earth_osm\gfk_data.py", line 30, in <module>
d = json.load(f)
File "C:\Miniconda3\envs\pypsa-earth\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Miniconda3\envs\pypsa-earth\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 157545: character maps to <undefined>
Describe the bug
Currently it is not possible to download everything contained inside a primary feature, without specifying the subfeatures.
To Reproduce
Steps to reproduce the described behavior:
Expected behavior
You will run into the following error:
Traceback (most recent call last):
File "../bin/earth_osm", line 33, in
sys.exit(load_entry_point('earth-osm', 'console_scripts', 'earth_osm')())
File "..earth-osm/earth_osm/args.py", line 119, in main
f'Sub Features: {" - ".join(feature_list)}',
TypeError: sequence item 0: expected str instance, bool found
Desktop
The geojson file misses a lot of tags e.g. 'name'. It would be great to have more control over the tags that get written into geojson.
The advanced usage read like this:
import earth_osm as eo
eo.get_osm_data(
primary_name = 'power',
region_list = ['benin', 'monaco'],
feature_list = ['substation', 'line'],
update = False,
mp = True,
data_dir = './earth_data',
)
By default, the output will be csv and geojson.
Making this an option e.g. csv only, geojson only or both ['csv', 'geojson'] could be helpful
Badges at the readme can be really helpful e.g. indicating important information such as CI status. Some examples of good badges are given here (some of them can be copied & some of them need a registration e.g. pre-commit):
See here how PyPSA is doing this with Github Actions.
I've been working on that commit for a while now, and have only picked it up again a few days ago. In particular the data pipeline needs to be modified so that it can handle: Node (Towers), Way (Lines and Cables), Area (Substations and Generators). There is some detection that needs to occur if something is a way or an area. More info on ways from OSM. Ways are converted to LineStrings while Areas are converted to Polygons. Currently all Nodes are converted to Points which is wrong and would cause problems with buildings etc. relevant code from osmnx here.
As a checklist:
Originally posted by @mnm-matin in #14 (comment)
dear developer, hi
i find maybe there is a bug. when i runcode, find error in eo.py
'IndentationError: expected an indented block after 'if' statement on line 53'
I originally intalled the earth-osm by pypsa-earth, and the earth-osm version is 0.0.9
Found a nice little bug? Fill out the section below. ⬇️
Describe the bug
codecov is deprecated
https://docs.codecov.com/docs/codecov-uploader
To Reproduce
make test
CI is failing
Expected behavior
code coverage should be produced using github action codecov uploader
Additional context
Add any other context about the problem here.
We discussed recently in a PyPSA-Earth workflow meeting that considering the package has global scope, using a 2 letter ISO code is probably not the best. Just imagine you extract data for 100 countries. The 2-letter codes become really quickly quite challenging to distinguish...
I think to provide maximal flexibility, we should over an option to include 2-letter (ISO), 3-letter (ISO), or full country (NOT ISO) naming options. The user can pass only one of the options or all of them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.