opengeos / open-buildings Goto Github PK
View Code? Open in Web Editor NEWTools for working with open building datasets
Home Page: https://opengeos.github.io/open-buildings
License: Other
Tools for working with open building datasets
Home Page: https://opengeos.github.io/open-buildings
License: Other
Add the splitting of multipolygons to the ogr process. I'm not sure if it's possible to do pure CLI call to do this operation, so it may need to make use of Fiona, but that may lose the speed of the column-oriented API. So if it just ends up being about the same speed as pandas (with fiona under its hood) then perhaps we just don't implement.
Right now if you do a get_buildings
call on a big area it can take a long time, but you have no idea if it's working away or something has gone wrong. It'd be much better if it could report on how things are going. The ideal would be to report on what it's scanned remotely (maybe just in --verbose), and then to show some streaming progress as it downloads buildings.
This may not be possible, since it's DuckDB doing all the querying, but perhaps there's a way to hook into what it's doing and report out.
Trying to save the output as a shapefile fails, see command and traceback below.
$ echo '{ "type": "Feature", "properties": {}, "geometry": {"coordinates": [[[-0.13085471468215815, 51.50945096318702], [-0.13085471468215815, 51.50612362847875], [-0.12508113856225123, 51.50612362847875], [-0.12508113856225123, 51.50945096318702], [-0.13085471468215815, 51.50945096318702]]], "type": "Polygon"}}' | ob get_buildings - buildings.shp --country_iso GB
[2023-10-14 10:12:02] Querying and downloading data for quadkey 0313131311 in country GB...
[2023-10-14 10:12:02] Expect query times of at least 5-10 seconds
[2023-10-14 10:12:02] Installing DuckDB spatial extension...
[2023-10-14 10:12:56] Downloaded 65 features into DuckDB.
[2023-10-14 10:12:56] Writing to buildings.shp...
terminate called after throwing an instance of 'duckdb::IOException'
what(): IO Error: Could not write file "buildings.dbf": Bad file descriptor
Aborted (core dumped)
Right now the get-buildings call just requests all attributes. It'd be nice if we had --include
and --exclude
flags like tippecanoe does, to give users more control over the attributes they want.
To tackle this issue the get_buildings command in cli.py should have two more flags added, and then download_buildings should get them in. Then the select_values variable should be tweaked with the right logic. Can leverage DuckDB's exclude command to form the sql, and the include would just pass the values in. It'd likely make sense to just not allow a user to use both include and exclude, as that'd be funky logic to get right.
When I request a location with --location
and it doesn't have results I get a barfed stack trace:
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/bin/ob", line 8, in <module>
sys.exit(main())
^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/open_buildings/cli.py", line 84, in get_buildings
geojson_data = geocode(location)
^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/open_buildings/cli.py", line 40, in geocode
location = osmnx.geocode_to_gdf(data)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/osmnx/geocoder.py", line 137, in geocode_to_gdf
gdf = pd.concat([gdf, _geocode_query_to_gdf(q, wr, by_osmid)])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/qgis/lib/python3.11/site-packages/osmnx/geocoder.py", line 187, in _geocode_query_to_gdf
raise InsufficientResponseError(msg)
osmnx._errors.InsufficientResponseError: Nominatim geocoder returned 0 results for query 'adsfsad'
It'd be better to catch the error and just inform users that their location string didn't work - they can get the geojson on their own or try a more common string.
The microsoft building footprints are hosted on planetary computer as GeoParquet, partitioned by quadkey. So set up right for the get-buildings
tool. The main challenge to sort out to add it is to get the auth right. I believe auth is required to get at the data. But anyone can sign up for planetary computer, and it might work with any azure account (I'm not sure). The new azure extension for duckdb may also help with this.
Just getting started with a fresh duckdb install, get_buildings
fails because we try to load the spatial extension even if it has not been previously installed.
pip install -e .
ob tools get_buildings 1.json my-buildings.geojson --country_iso RW
~/duckdb/extensions/...
get_buildings
This is technically outside the scope of this project, as a QGIS plugin should have its own repo. But I think this cloud-native geo querying of source.coop datasets could be much more accessible for users if there was a QGIS plugin that used the same technique.
I've never written one, but it would likely be a very cool first one to do, so if I have a chance I'll try it. But anyone else is welcome to try, and to reuse as much or as little code from here as desired.
Right now if you try to run --skip-split-multis
on the ogr process in the benchmark command it will return a table of super fast responses. This is because the operations isn't actually running - with 'convert' it just informs that the process doesn't yet work. This is fine for convert, but for benchmark we should probably at least print a WARNING that says the times aren't valid, and maybe even just leave it off the results. Or we could run it with skip-multis, since the timing is likely representative - the difference for duckdb and pandas isn't significant. Could also just try to implement it, but I'm not sure if it's possible to do with pure command line calls.
With #44 we lose the ability to specify .geojson as the output, which I think some people like to do, and there's no official recommendation. I don't think it's a big need, but just making an issue to track.
Right now the get-buildings
command just has some hard coded attributes for testing, and just is ones the overture buildings has. Should have it default to getting all attributes, and then also have a flag where a user can put the attributes they want. Ideally both include and exclude options.
It'd be a nice helper method to just do a one-liner to grab a csv and use it locally. Could be interesting to investigate just being able to use a csv id in the CLI and download and convert in one step, in the convert
function.
Adding a wrong iso_country
will result in 0 buildings coming down, since it'll just query the wrong country. It'd be good to warn users about that possibility.
I think this can be pretty simple - just check if there are 0 features that are downloaded and if the user provided a country_iso. If both of those are true then print out something like 'WARNING: You supplied country_iso BR and your geojson got 0 buildings. Check to be sure your GeoJSON is actually in the right country'. Note that if we do #29 then there will hopefully be no need for this warning.
There are a number of different parquet compression options (snappy, gzip, zstd, brotli, uncompressed, etc) that can make things faster/slower and smaller/bigger. It'd be nice to be able to benchmark / compare those. Right now there's a global variable that can be used to set this. If implemented it should raise appropriate errors on the process used, as each process supports a different set of compression options.
Two things that I'm unclear about:
Describe what you were trying to get done.
PS: This just an observation. The installation was successful(on a windows pc)
After running installation command pip install open-buildings
the list of dependencies that pip was installing on the background was suspiciously longer than what would have been expected from the requirements.txt
file. Some packages that would have been expected only when building documentation for example.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sphinx 3.3.0 requires sphinxcontrib-applehelp, which is not installed.
sphinx 3.3.0 requires sphinxcontrib-devhelp, which is not installed.
sphinx 3.3.0 requires sphinxcontrib-htmlhelp, which is not installed.
sphinx 3.3.0 requires sphinxcontrib-jsmath, which is not installed.
sphinx 3.3.0 requires sphinxcontrib-qthelp, which is not installed.
sphinx 3.3.0 requires sphinxcontrib-serializinghtml, which is not installed.
pointpats 2.2.0 requires opencv-contrib-python>=4.2.0, which is not installed.
access 1.1.1 requires Sphinx==2.4.3, but you have sphinx 3.3.0 which is incompatible.
pysal 2.3.0 requires python-dateutil<=2.8.0, but you have python-dateutil 2.8.2 which is incompatible.
pysal 2.3.0 requires urllib3<1.25, but you have urllib3 1.25.11 which is incompatible.
pytest 6.1.2 requires pluggy<1.0,>=0.12, but you have pluggy 1.0.0 which is incompatible.
statsmodels 0.14.0 requires patsy>=0.5.2, but you have patsy 0.5.1 which is incompatible.
Based on this message log, I decided to check on the dependencies tree and what package requires these packages i.e
List of all packages Installed
MarkupSafe-2.1.3
PySocks-1.7.1
attrs-23.1.0
boto3-1.28.80
botocore-1.31.80
bqplot-0.12.42
branca-0.7.0
charset-normalizer-3.3.2
click-8.1.7
cligj-0.7.2
colour-0.1.5
comm-0.2.0
duckdb-0.9.1
folium-0.15.0
gdown-4.7.1
geojson-3.1.0
ipyevents-2.0.2
ipyfilechooser-0.6.0
ipyleaflet-0.17.4
ipytree-0.2.2
ipywidgets-8.1.1
jmespath-1.0.1
jsonschema-4.19.2
jsonschema-specifications-2023.7.1
jupyterlab-widgets-3.0.9
leafmap-0.28.1
mercantile-1.2.1
open-buildings-0.10.0
openlocationcode-1.0.1
pyshp-2.3.1
pystac-1.9.0
pystac-client-0.7.5
python-box-7.1.1
python-dateutil-2.8.2
referencing-0.30.2
requests-2.31.0
rpds-py-0.12.0
s3transfer-0.7.0
scooby-0.9.2
tabulate-0.9.0
traittypes-0.2.1
whitebox-2.3.1
whiteboxgui-2.3.0
widgetsnbextension-4.0.9
xyzservices-2023.10.1
Used pipdeptree
tool to print out the dependency tree for open-buildings package/tool
Command
pipdeptree.exe --package open-buildings
Output
------------------------------------------------------------------------
open-buildings==0.10.0
├── boto3 [required: Any, installed: 1.28.80]
│ ├── botocore [required: >=1.31.80,<1.32.0, installed: 1.31.80]
│ │ ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│ │ ├── python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
│ │ │ └── six [required: >=1.5, installed: 1.15.0]
│ │ └── urllib3 [required: >=1.25.4,<1.27, installed: 1.25.11]
│ ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│ └── s3transfer [required: >=0.7.0,<0.8.0, installed: 0.7.0]
│ └── botocore [required: >=1.12.36,<2.0a.0, installed: 1.31.80]
│ ├── jmespath [required: >=0.7.1,<2.0.0, installed: 1.0.1]
│ ├── python-dateutil [required: >=2.1,<3.0.0, installed: 2.8.2]
│ │ └── six [required: >=1.5, installed: 1.15.0]
│ └── urllib3 [required: >=1.25.4,<1.27, installed: 1.25.11]
├── click [required: Any, installed: 8.1.7]
│ └── colorama [required: Any, installed: 0.4.4]
├── duckdb [required: Any, installed: 0.9.1]
├── geopandas [required: Any, installed: 0.13.2]
│ ├── Fiona [required: >=1.8.19, installed: 1.9.4.post1]
│ │ ├── attrs [required: >=19.2.0, installed: 23.1.0]
│ │ ├── certifi [required: Any, installed: 2020.6.20]
│ │ ├── click [required: ~=8.0, installed: 8.1.7]
│ │ │ └── colorama [required: Any, installed: 0.4.4]
│ │ ├── click-plugins [required: >=1.0, installed: 1.1.1]
│ │ │ └── click [required: >=4.0, installed: 8.1.7]
│ │ │ └── colorama [required: Any, installed: 0.4.4]
│ │ ├── cligj [required: >=0.5, installed: 0.7.2]
│ │ │ └── click [required: >=4.0, installed: 8.1.7]
│ │ │ └── colorama [required: Any, installed: 0.4.4]
│ │ ├── importlib-metadata [required: Any, installed: 2.0.0]
│ │ │ └── zipp [required: >=0.5, installed: 3.4.0]
│ │ └── six [required: Any, installed: 1.15.0]
│ ├── packaging [required: Any, installed: 23.0]
│ ├── pandas [required: >=1.1.0, installed: 2.0.2]
│ │ ├── numpy [required: >=1.20.3, installed: 1.24.1]
│ │ ├── python-dateutil [required: >=2.8.2, installed: 2.8.2]
│ │ │ └── six [required: >=1.5, installed: 1.15.0]
│ │ ├── pytz [required: >=2020.1, installed: 2023.3]
│ │ └── tzdata [required: >=2022.1, installed: 2023.3]
│ ├── pyproj [required: >=3.0.1, installed: 3.6.0]
│ │ └── certifi [required: Any, installed: 2020.6.20]
│ └── shapely [required: >=1.7.1, installed: 2.0.1]
│ └── numpy [required: >=1.14, installed: 1.24.1]
├── leafmap [required: Any, installed: 0.28.1]
│ ├── bqplot [required: Any, installed: 0.12.42]
│ │ ├── ipywidgets [required: >=7.5.0,<9, installed: 8.1.1]
│ │ │ ├── comm [required: >=0.1.3, installed: 0.2.0]
│ │ │ │ └── traitlets [required: >=4, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── ipython [required: >=6.1.0, installed: 7.18.1]
│ │ │ │ ├── backcall [required: Any, installed: 0.2.0]
│ │ │ │ ├── colorama [required: Any, installed: 0.4.4]
│ │ │ │ ├── decorator [required: Any, installed: 4.4.2]
│ │ │ │ ├── jedi [required: >=0.10, installed: 0.17.2]
│ │ │ │ │ └── parso [required: >=0.7.0,<0.8.0, installed: 0.7.1]
│ │ │ │ ├── pickleshare [required: Any, installed: 0.7.5]
│ │ │ │ ├── prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.8]
│ │ │ │ │ └── wcwidth [required: Any, installed: 0.2.5]
│ │ │ │ ├── Pygments [required: Any, installed: 2.7.2]
│ │ │ │ ├── setuptools [required: >=18.5, installed: 67.6.0]
│ │ │ │ └── traitlets [required: >=4.2, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── jupyterlab-widgets [required: ~=3.0.9, installed: 3.0.9]
│ │ │ ├── traitlets [required: >=4.3.1, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ └── widgetsnbextension [required: ~=4.0.9, installed: 4.0.9]
│ │ ├── numpy [required: >=1.10.4, installed: 1.24.1]
│ │ ├── pandas [required: >=1.0.0,<3.0.0, installed: 2.0.2]
│ │ │ ├── numpy [required: >=1.20.3, installed: 1.24.1]
│ │ │ ├── python-dateutil [required: >=2.8.2, installed: 2.8.2]
│ │ │ │ └── six [required: >=1.5, installed: 1.15.0]
│ │ │ ├── pytz [required: >=2020.1, installed: 2023.3]
│ │ │ └── tzdata [required: >=2022.1, installed: 2023.3]
│ │ ├── traitlets [required: >=4.3.0, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ └── traittypes [required: >=0.0.6, installed: 0.2.1]
│ │ └── traitlets [required: >=4.2.2, installed: 5.0.5]
│ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ ├── colour [required: Any, installed: 0.1.5]
│ ├── folium [required: Any, installed: 0.15.0]
│ │ ├── branca [required: >=0.6.0, installed: 0.7.0]
│ │ │ └── Jinja2 [required: Any, installed: 3.1.2]
│ │ │ └── MarkupSafe [required: >=2.0, installed: 2.1.3]
│ │ ├── Jinja2 [required: >=2.9, installed: 3.1.2]
│ │ │ └── MarkupSafe [required: >=2.0, installed: 2.1.3]
│ │ ├── numpy [required: Any, installed: 1.24.1]
│ │ └── requests [required: Any, installed: 2.31.0]
│ │ ├── certifi [required: >=2017.4.17, installed: 2020.6.20]
│ │ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ │ ├── idna [required: >=2.5,<4, installed: 2.10]
│ │ └── urllib3 [required: >=1.21.1,<3, installed: 1.25.11]
│ ├── gdown [required: Any, installed: 4.7.1]
│ │ ├── beautifulsoup4 [required: Any, installed: 4.9.3]
│ │ │ └── soupsieve [required: >1.2, installed: 2.0.1]
│ │ ├── filelock [required: Any, installed: 3.0.12]
│ │ ├── requests [required: Any, installed: 2.31.0]
│ │ │ ├── certifi [required: >=2017.4.17, installed: 2020.6.20]
│ │ │ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ │ │ ├── idna [required: >=2.5,<4, installed: 2.10]
│ │ │ └── urllib3 [required: >=1.21.1,<3, installed: 1.25.11]
│ │ ├── six [required: Any, installed: 1.15.0]
│ │ └── tqdm [required: Any, installed: 4.51.0]
│ ├── geojson [required: Any, installed: 3.1.0]
│ ├── ipyevents [required: Any, installed: 2.0.2]
│ │ └── ipywidgets [required: >=7.6.0, installed: 8.1.1]
│ │ ├── comm [required: >=0.1.3, installed: 0.2.0]
│ │ │ └── traitlets [required: >=4, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ ├── ipython [required: >=6.1.0, installed: 7.18.1]
│ │ │ ├── backcall [required: Any, installed: 0.2.0]
│ │ │ ├── colorama [required: Any, installed: 0.4.4]
│ │ │ ├── decorator [required: Any, installed: 4.4.2]
│ │ │ ├── jedi [required: >=0.10, installed: 0.17.2]
│ │ │ │ └── parso [required: >=0.7.0,<0.8.0, installed: 0.7.1]
│ │ │ ├── pickleshare [required: Any, installed: 0.7.5]
│ │ │ ├── prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.8]
│ │ │ │ └── wcwidth [required: Any, installed: 0.2.5]
│ │ │ ├── Pygments [required: Any, installed: 2.7.2]
│ │ │ ├── setuptools [required: >=18.5, installed: 67.6.0]
│ │ │ └── traitlets [required: >=4.2, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ ├── jupyterlab-widgets [required: ~=3.0.9, installed: 3.0.9]
│ │ ├── traitlets [required: >=4.3.1, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ └── widgetsnbextension [required: ~=4.0.9, installed: 4.0.9]
│ ├── ipyfilechooser [required: Any, installed: 0.6.0]
│ │ └── ipywidgets [required: Any, installed: 8.1.1]
│ │ ├── comm [required: >=0.1.3, installed: 0.2.0]
│ │ │ └── traitlets [required: >=4, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ ├── ipython [required: >=6.1.0, installed: 7.18.1]
│ │ │ ├── backcall [required: Any, installed: 0.2.0]
│ │ │ ├── colorama [required: Any, installed: 0.4.4]
│ │ │ ├── decorator [required: Any, installed: 4.4.2]
│ │ │ ├── jedi [required: >=0.10, installed: 0.17.2]
│ │ │ │ └── parso [required: >=0.7.0,<0.8.0, installed: 0.7.1]
│ │ │ ├── pickleshare [required: Any, installed: 0.7.5]
│ │ │ ├── prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.8]
│ │ │ │ └── wcwidth [required: Any, installed: 0.2.5]
│ │ │ ├── Pygments [required: Any, installed: 2.7.2]
│ │ │ ├── setuptools [required: >=18.5, installed: 67.6.0]
│ │ │ └── traitlets [required: >=4.2, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ ├── jupyterlab-widgets [required: ~=3.0.9, installed: 3.0.9]
│ │ ├── traitlets [required: >=4.3.1, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ └── widgetsnbextension [required: ~=4.0.9, installed: 4.0.9]
│ ├── ipyleaflet [required: Any, installed: 0.17.4]
│ │ ├── branca [required: >=0.5.0, installed: 0.7.0]
│ │ │ └── Jinja2 [required: Any, installed: 3.1.2]
│ │ │ └── MarkupSafe [required: >=2.0, installed: 2.1.3]
│ │ ├── ipywidgets [required: >=7.6.0,<9, installed: 8.1.1]
│ │ │ ├── comm [required: >=0.1.3, installed: 0.2.0]
│ │ │ │ └── traitlets [required: >=4, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── ipython [required: >=6.1.0, installed: 7.18.1]
│ │ │ │ ├── backcall [required: Any, installed: 0.2.0]
│ │ │ │ ├── colorama [required: Any, installed: 0.4.4]
│ │ │ │ ├── decorator [required: Any, installed: 4.4.2]
│ │ │ ├── jedi [required: >=0.10, installed: 0.17.2]
│ │ │ │ │ └── parso [required: >=0.7.0,<0.8.0, installed: 0.7.1]
│ │ │ │ ├── pickleshare [required: Any, installed: 0.7.5]
│ │ │ │ ├── prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.8]
│ │ │ │ │ └── wcwidth [required: Any, installed: 0.2.5]
│ │ │ │ ├── Pygments [required: Any, installed: 2.7.2]
│ │ │ │ ├── setuptools [required: >=18.5, installed: 67.6.0]
│ │ │ │ └── traitlets [required: >=4.2, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── jupyterlab-widgets [required: ~=3.0.9, installed: 3.0.9]
│ │ │ ├── traitlets [required: >=4.3.1, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ └── widgetsnbextension [required: ~=4.0.9, installed: 4.0.9]
│ │ ├── traittypes [required: >=0.2.1,<3, installed: 0.2.1]
│ │ │ └── traitlets [required: >=4.2.2, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ └── xyzservices [required: >=2021.8.1, installed: 2023.10.1]
│ ├── ipywidgets [required: Any, installed: 8.1.1]
│ │ ├── comm [required: >=0.1.3, installed: 0.2.0]
│ │ │ └── traitlets [required: >=4, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ ├── ipython [required: >=6.1.0, installed: 7.18.1]
│ │ │ ├── backcall [required: Any, installed: 0.2.0]
│ │ │ ├── colorama [required: Any, installed: 0.4.4]
│ │ │ ├── decorator [required: Any, installed: 4.4.2]
│ │ │ ├── jedi [required: >=0.10, installed: 0.17.2]
│ │ │ │ └── parso [required: >=0.7.0,<0.8.0, installed: 0.7.1]
│ │ │ ├── pickleshare [required: Any, installed: 0.7.5]
│ │ │ ├── prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.8]
│ │ │ │ └── wcwidth [required: Any, installed: 0.2.5]
│ │ │ ├── Pygments [required: Any, installed: 2.7.2]
│ │ │ ├── setuptools [required: >=18.5, installed: 67.6.0]
│ │ │ └── traitlets [required: >=4.2, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ ├── jupyterlab-widgets [required: ~=3.0.9, installed: 3.0.9]
│ │ ├── traitlets [required: >=4.3.1, installed: 5.0.5]
│ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ └── widgetsnbextension [required: ~=4.0.9, installed: 4.0.9]
│ ├── matplotlib [required: Any, installed: 3.5.1]
│ │ ├── cycler [required: >=0.10, installed: 0.10.0]
│ │ │ └── six [required: Any, installed: 1.15.0]
│ │ ├── fonttools [required: >=4.22.0, installed: 4.28.5]
│ │ ├── kiwisolver [required: >=1.0.1, installed: 1.2.0]
│ │ ├── numpy [required: >=1.17, installed: 1.24.1]
│ │ ├── packaging [required: >=20.0, installed: 23.0]
│ │ ├── Pillow [required: >=6.2.0, installed: 9.2.0]
│ │ ├── pyparsing [required: >=2.2.1, installed: 2.4.7]
│ │ └── python-dateutil [required: >=2.7, installed: 2.8.2]
│ │ └── six [required: >=1.5, installed: 1.15.0]
│ ├── numpy [required: Any, installed: 1.24.1]
│ ├── pandas [required: Any, installed: 2.0.2]
│ │ ├── numpy [required: >=1.20.3, installed: 1.24.1]
│ │ ├── python-dateutil [required: >=2.8.2, installed: 2.8.2]
│ │ │ └── six [required: >=1.5, installed: 1.15.0]
│ │ ├── pytz [required: >=2020.1, installed: 2023.3]
│ │ └── tzdata [required: >=2022.1, installed: 2023.3]
│ ├── pyshp [required: Any, installed: 2.3.1]
│ ├── pystac-client [required: Any, installed: 0.7.5]
│ │ ├── pystac [required: >=1.8.2, installed: 1.9.0]
│ │ │ └── python-dateutil [required: >=2.7.0, installed: 2.8.2]
│ │ │ └── six [required: >=1.5, installed: 1.15.0]
│ │ ├── python-dateutil [required: >=2.8.2, installed: 2.8.2]
│ │ │ └── six [required: >=1.5, installed: 1.15.0]
│ │ └── requests [required: >=2.28.2, installed: 2.31.0]
│ │ ├── certifi [required: >=2017.4.17, installed: 2020.6.20]
│ │ ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│ │ ├── idna [required: >=2.5,<4, installed: 2.10]
│ │ └── urllib3 [required: >=1.21.1,<3, installed: 1.25.11]
│ ├── python-box [required: Any, installed: 7.1.1]
│ ├── scooby [required: Any, installed: 0.9.2]
│ ├── whiteboxgui [required: Any, installed: 2.3.0]
│ │ ├── ipyfilechooser [required: Any, installed: 0.6.0]
│ │ │ └── ipywidgets [required: Any, installed: 8.1.1]
│ │ │ ├── comm [required: >=0.1.3, installed: 0.2.0]
│ │ │ │ └── traitlets [required: >=4, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── ipython [required: >=6.1.0, installed: 7.18.1]
│ │ │ │ ├── backcall [required: Any, installed: 0.2.0]
│ │ │ │ ├── colorama [required: Any, installed: 0.4.4]
│ │ │ │ ├── decorator [required: Any, installed: 4.4.2]
│ │ │ │ ├── jedi [required: >=0.10, installed: 0.17.2]
│ │ │ │ │ └── parso [required: >=0.7.0,<0.8.0, installed: 0.7.1]
│ │ │ │ ├── pickleshare [required: Any, installed: 0.7.5]
│ │ │ │ ├── prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.8]
│ │ │ │ │ └── wcwidth [required: Any, installed: 0.2.5]
│ │ │ │ ├── Pygments [required: Any, installed: 2.7.2]
│ │ │ │ ├── setuptools [required: >=18.5, installed: 67.6.0]
│ │ │ │ └── traitlets [required: >=4.2, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── jupyterlab-widgets [required: ~=3.0.9, installed: 3.0.9]
│ │ │ ├── traitlets [required: >=4.3.1, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ └── widgetsnbextension [required: ~=4.0.9, installed: 4.0.9]
│ │ ├── ipytree [required: Any, installed: 0.2.2]
│ │ │ └── ipywidgets [required: >=7.5.0,<9, installed: 8.1.1]
│ │ │ ├── comm [required: >=0.1.3, installed: 0.2.0]
│ │ │ │ └── traitlets [required: >=4, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── ipython [required: >=6.1.0, installed: 7.18.1]
│ │ │ │ ├── backcall [required: Any, installed: 0.2.0]
│ │ │ │ ├── colorama [required: Any, installed: 0.4.4]
│ │ │ │ ├── decorator [required: Any, installed: 4.4.2]
│ │ │ │ ├── jedi [required: >=0.10, installed: 0.17.2]
│ │ │ │ │ └── parso [required: >=0.7.0,<0.8.0, installed: 0.7.1]
│ │ │ │ ├── pickleshare [required: Any, installed: 0.7.5]
│ │ │ │ ├── prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.8]
│ │ │ │ │ └── wcwidth [required: Any, installed: 0.2.5]
│ │ │ │ ├── Pygments [required: Any, installed: 2.7.2]
│ │ │ │ ├── setuptools [required: >=18.5, installed: 67.6.0]
│ │ │ │ └── traitlets [required: >=4.2, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── jupyterlab-widgets [required: ~=3.0.9, installed: 3.0.9]
│ │ │ ├── traitlets [required: >=4.3.1, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ └── widgetsnbextension [required: ~=4.0.9, installed: 4.0.9]
│ │ ├── ipywidgets [required: Any, installed: 8.1.1]
│ │ │ ├── comm [required: >=0.1.3, installed: 0.2.0]
│ │ │ │ └── traitlets [required: >=4, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── ipython [required: >=6.1.0, installed: 7.18.1]
│ │ │ │ ├── backcall [required: Any, installed: 0.2.0]
│ │ │ │ ├── colorama [required: Any, installed: 0.4.4]
│ │ │ │ ├── decorator [required: Any, installed: 4.4.2]
│ │ │ │ ├── jedi [required: >=0.10, installed: 0.17.2]
│ │ │ │ │ └── parso [required: >=0.7.0,<0.8.0, installed: 0.7.1]
│ │ │ │ ├── pickleshare [required: Any, installed: 0.7.5]
│ │ │ │ ├── prompt-toolkit [required: >=2.0.0,<3.1.0,!=3.0.1,!=3.0.0, installed: 3.0.8]
│ │ │ │ │ └── wcwidth [required: Any, installed: 0.2.5]
│ │ │ │ ├── Pygments [required: Any, installed: 2.7.2]
│ │ │ │ ├── setuptools [required: >=18.5, installed: 67.6.0]
│ │ │ │ └── traitlets [required: >=4.2, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ ├── jupyterlab-widgets [required: ~=3.0.9, installed: 3.0.9]
│ │ │ ├── traitlets [required: >=4.3.1, installed: 5.0.5]
│ │ │ │ └── ipython-genutils [required: Any, installed: 0.2.0]
│ │ │ └── widgetsnbextension [required: ~=4.0.9, installed: 4.0.9]
│ │ └── whitebox [required: Any, installed: 2.3.1]
│ │ └── click [required: >=6.0, installed: 8.1.7]
│ │ └── colorama [required: Any, installed: 0.4.4]
│ └── xyzservices [required: Any, installed: 2023.10.1]
├── mercantile [required: Any, installed: 1.2.1]
│ └── click [required: >=3.0, installed: 8.1.7]
│ └── colorama [required: Any, installed: 0.4.4]
├── openlocationcode [required: Any, installed: 1.0.1]
├── pandas [required: Any, installed: 2.0.2]
│ ├── numpy [required: >=1.20.3, installed: 1.24.1]
│ ├── python-dateutil [required: >=2.8.2, installed: 2.8.2]
│ │ └── six [required: >=1.5, installed: 1.15.0]
│ ├── pytz [required: >=2020.1, installed: 2023.3]
│ └── tzdata [required: >=2022.1, installed: 2023.3]
├── shapely [required: Any, installed: 2.0.1]
│ └── numpy [required: >=1.14, installed: 1.24.1]
└── tabulate [required: Any, installed: 0.9.0]
From the tree output, leamap
had the most deps thus a general inspection through the code base to find instances in which the package is being used by using visual studio code text editor search ctrl + shift + f
functionality.
It turns out to be used once, in the examples file download_buildings.ipynb
and no usage in the main package source code. Hence kept asking the question why it is included in the main requirements.txt
file instead of just docs requirements.
side Note: Any instructions on how to build the docs locally would be appreciated thanks for the awesome tool.
With the 0.8 release we'll have functions to format overture buildings, and to download any buildings.
I don't really understand how python packages work, but I'd like to get it so the cli.py has 3 subcommands:
open_buildings google
with convert and benchmark under it - the existing main commands.
open_buildings overture
with commands to work with overture data.
open_buildings tools
that has the 'get-buildings' command.
Should likely combine the two overture python files into one. Right now they all have click interfaces in them - the intent was to move all the click interfaces to the cli.py. But once I got into it it would be nice to have some of the commands that were more made for debugging / figuring out what's going (like quad2json, wkt, etc) be in a CLI, but not in the 'main' CLI. So maybe it makes sense to have click packages in both?
I'm also more than open to other ideas on how to organize things. I do think it'll make sense to evolve more of the functions to be 'generic', but I'll make a separate ticket for that.
It seems like it should be possible to automatically include a country_iso to substantially speed up the query. The current method of having the user supply it is potentially error prone, and annoying.
The idea would be to calculate the list of country_iso values for every single quadkey. This would have to be a list, since quadkeys can cross countries, and big ones could have a hundred or more countries in them. But most should be one or a handful of countries, which will most always speed up the query.
There are 16 million quadkeys at level 12, but many are likely in the ocean. We likely could use a quadkey at level 10 or even 8, as having a couple more hive partitions to help wouldn't still make it worth it.
So I think the main thing would be to make a script that generates a list of country iso codes for every quadkey. Then store that as a parquet file, and if it's not too big we could likely just include it in the open_buildings package.
If we had this then we could remove the country_iso flag, as we'd be able to always use a hive partition.
In making the get-buildings
command I went through a couple of iterations of trying out different formatting - definitely realizing that more row groups than gpq makes by default is better. And with the latest scripts I have a way to set the 'max number of rows' per file and also the number of row groups. But I have no idea if things could be lots faster if we increased or decreased row group size, and/or increased / decreased number of files. The 'defaults' I used were max 10 million rows per file and 20000 rows per group. It'd be great to try out some variations on that. And ideally experiment on the tradeoffs between 'legibility for download' (like use country then admin level 1 like the google buildings data does) vs 'balance of spatial size' (like use the quadkey max size algorithm entirely, instead of country then quadkey, so we'd have much fewer files over all, but each file would be meaningless to users - they'd need to use the 'tool' to download).
The performance I was getting to was 20-30 seconds to download a small number of buildings. But it was just a handful of tests.
Ideally we'd have a command that would run a 'benchmark' that would have 20-30 locations globally and get the performance for each of them and report that out, so we can easily compare how tweaks to the data work.
It'd be nice to be able to quickly see how many buildings a request would result in, instead of downloading all the buildings. This can be done by just doing a select count (*)
(instead of select *
) in DuckDB, and then just printing that out and not downloading anything.
To add this just start with --verbose
to see what type of queries DuckDB will issue, and then try out a similar query that will just get the count and make sure it works. Then try just changing the core 'download' command to do a count and print that out. Once that is working then you just need to add the flag, to the cli.py and pass in the count flag to the download function in download_buildings
.
If you want to take this on and have more questions feel free to comment here and I can explain more.
The main overture commands could likely be done fairly generic for any large geospatial file. It'd be great to evolve them to at least be 'tools', and perhaps even be their own package that 'open_buildings' would call / depend on. The overall flow of how the data is formatted is:
A more generic version of this would likely take input from more than parquet files (or at least have a command to convert to parquet files). And it would not be tied to the 'buildings' name.
We have a nice CI system that checks multiple operating systems to be sure that all still works. But there's one big problem - there's no tests that it runs. It's been a one man project, but if the community grows it's essential to be able to automatically check if changes broke something unexpected (also good even when it is a one man project).
ChatGPT can likely help in the creation of tests. Would be good to have unit tests, and also some more integration tests that use the source.coop files to ensure it's all working.
Right now you have to input a geojson - it'd be much nicer for many users to just enter like a city, state or county name and get buildings for it.
Ideally we find a geocoder that returns polygons, and doesn't cost too much. I could likely pay for it for a bit, but we probably evolve to making it a config option for people to put in their geocoder api key.
The google buildings cng distro on source is done quite similar to the overture one (indeed was the start of those experiments), and ideally the get-buildings
tool would work with it just as easily. There may only be small tweaks needed to get it 'working' - the likely thing that's needed is to optimize the google buildings dataset with the learnings from overture. The biggest one is setting the row group size.
Even just a link to
open-buildings/open_buildings/cli.py
Line 238 in 276386b
The VIDA dataset on source combines google and microsoft buildings, and should get the most buildings of the different options. It should be relatively easy to add, but it doesn't use 'quadkey' for spatial partitioning, it's s2 instead. The one to add is https://beta.source.coop/vida/google-microsoft-open-buildings/geoparquet/by_country_s2 - as it's more partitioned and likely will perform much better (though it's worth trying both).
The main task for this is to have a different 'spatial' column - the current set up assumes quadkey, as that's what the first two were done with. Ideally download_buildings function would take an argument that would next be 'quadkey' or 's2', and we could add h3, geohash, etc. The get_building
CLI should just have an option to use this dataset, and then it can pass the right arguments into download_buildings.
The quadkey is computed client side, and it's likely similarly easy to compute the s2 key, and then use that in the query.
Right now the geojson_to_quadkey function in download_buildings keeps zooming out until it hits a quadkey that completely encompasses the area. This can lead to some very big quadkeys if the area to query straddles a big quadkey - I hit one area in italy that was getting like a level 4 quadkey.
It'd be much better if we didn't make big scans. It seems like one route to do this would be to allow for more than one quadkey. The function seems like you could adjust the cut off for number of tiles to be more than one:
for zoom in range(12, -1, -1):
tiles = list(mercantile.tiles(min_lon, min_lat, max_lon, max_lat, zooms=zoom))
if len(tiles) == 1:
return mercantile.quadkey(tiles[0])
So this migh tbe simple, to just try to get a bigger list. I think we probably don't want to return a huge list, so maybe just when it gets to be covered by 4 or 10 quadkeys. Probably worth some experimenting on query times with different combinations - the parquet partition might have increased overhead to query a lot of different options, so maybe just looking for 2 makes sense. But even just 2 seems like it'd likely help in cases where it just straddles a huge quadkey.
There's perhaps some other technique that could be done here, to get the biggest one and then scale down.
Right now if you put in a geojson of an area that has no buildings (like the middle of the ocean, or the middle of the desert) then the get_buildings
command will write out a geospatial file with 0 rows. This is not ideal - instead it should warn the user and then not actually call the DuckDB command to write it out.
This should be pretty simple to do - somewhere in here just check if the count is 0, and if it is then print out to the user that 0 buildings were found and that no file was written, and then just return / skip the rest.
Trying to use the CLI for open buildings. It seems like something is sorta there - it installs open_buildings on the path (which is further than I got on my own), but then it says to replace this message by 'putting your code in open_buildings.cli.main'.
It seems like it'd be nice to align the pip install open-buildings
with the cli, like have both be open-buildings
or both be open_buildings
.
Originally reported in cholmes/google-buildings-tools#1
% open-buildings
zsh: command not found: open-buildings
% open_buildings
Replace this message by putting your code into open_buildings.cli.main
See click documentation at https://click.palletsprojects.com/
Right now there is a --no-gpq
flag, but it was poorly implemented and doesn't actually work. You can modify the python code to set a global variable, but it doesn't do anything different if you set it from the CLI.
I'm less into this idea, as it seems like crap for this goal of working with huge files, but could be interesting to show performance and size characteristics. I do love GeoJSON, it's one of the best formats, but this is not the use case for it.
The google building data on source.coop only has a complete partitioned one based on google's v2. The v3 version is started, but it hasn't actually been partitioned yet.
The idea was to try a few different tools to partition and write up comparisons, just like I did for convert
- https://cholmes.medium.com/performance-explorations-of-geoparquet-and-duckdb-84c0185ed399
But overture came into the mix and bumped the priority. There are lots of learnings from there, and many of the tools built for overture could be made more generic (#14 is the ticket for that). It'd be nice to just 'finish' v3 of google buildings in at least one partition, with row groups, so it can be used in get-buildings
(ticket #15), so that one doesn't just work with v2 which is less countries.
The get-buildings command seems to work decently well, but it does very little in the way of catching all the things a user might input things wrong. It'd be good to test out common ways that it wouldn't work right and give better warnings, etc.
It also appears to hang if you don't supply the file, I think since it's expecting stdin. I definitely want to keep the ability to do stdin, but ideally can warn users. In planet CLI we had '-' mean 'read from stdin' so that could be a good pattern to follow.
Right now the CLI informs a user that a query is at least 5-10 seconds if they have a country_iso and 30-60 if they do not. But that's really just the minimum times, for small areas. It'd be better if we could provide more guidance - like if someone is trying to query a huge area then tell them it can minutes or hours, or even longer on a slow connection. I just did a decent sized area around Sao Paulo and it took 18 minutes to download 5.36 million buildings / 716.9 mb, and my connection is pretty fast.
This would likely need #32 to be sure that the user is actually requesting a large area if it's a large quadkey, since right now if it's a geojson that straddles a quadkey then it can look big but would still go pretty fast.
And ideally we'd do a good bit of testing to be able to give guidance - test really sparse areas in australia and dense areas in like India, and then also on different connections.
This is related to #31 - though this is probably a bit easier, as it's just guidance based on the size of the request, not trying to actually report what's happening. Though if that one is easier then this one may not be needed.
I think a proper python interface would be nice to have, in addition to the CLI.
The following things should probably be changed, just gathering my thoughts here:
download()
silent
, as this is all handled by the built-in python logger. Using the python logger allows users to register their own handlers more easily. The CLI can simply translate flags into logger settings.I might make a PR when I have time, just wondering if you have any thoughts on this.
It'd be cool to be able to not just see the times of the resulting files, but also the size of them, to compare the formats. (This may make more sense in a dedicated benchmarking tool).
The requirements.txt
/requirements_dev.txt
files need some tidying up, it seems. I see leafmap in there but don't think it's used anywhere in the code. In addition, all packages should ideally be pinned to specific versions - otherwise there is a risk that pip downloads the latest available version at a given point in time which can even break versions of the package that used to work for users.
Currently the ob get_buildings
utility and the underlying download_buildings.download()
function accept a GeoJSON AOI, either as a file or piped from stdin. As WKT is already used under the hood, it makes sense to also accept WKT as an alternative to GeoJSON. It's easy to distinguish between both even when piped in.
Other formats could also be supported, potentially the full range of formats that is supported as output formats.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.