Giter Site home page Giter Site logo

dahnj / h3-pandas Goto Github PK

View Code? Open in Web Editor NEW
198.0 198.0 16.0 12.57 MB

Integration of H3 with GeoPandas and Pandas

Home Page: http://h3-pandas.readthedocs.io/

License: MIT License

Python 2.06% Shell 0.01% Jupyter Notebook 97.93% Makefile 0.01%
geopandas geospatial h3 h3-pandas hexagons-are-bestagons pandas pyhon

h3-pandas's Introduction

          ˚             ˚              *                        .              .            ✦                           ˚                                                                                                  ˚                        ˚       .      .                         .            *       ˚                                                    .                      .                        .                                                          .                  🚀   ˚                                  *          .                                                                   *                  *                     .                                                                             .                          .                           ˚                                                     .                               🌎     .                   .        
                                  🛰️                                                                                                .                           .       .                                             *                                                       🌑                                       .              .            .            *                             *                                                                                                                    ✦              .            .                                             ˚              *                        .              .                    .                                         ˚              *                        .              .                    ✦                                             

               .                    .                                                     .              .                                      ˚              *                        .             🐋 .                    ✦                                             

h3-pandas's People

Contributors

alpha-beta-soup avatar dahnj avatar florianneukirchen avatar richardscottoz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

h3-pandas's Issues

ValueError using `polyfill` + `h3_to_geo_boundary`

Intro

I want to fill a polygon with hexagons of a given resolution and then get the boundaries for those hexagons.
My code is the following:

import h3
import h3pandas

def generate_grid(region_bounds: gpd.GeoDataFrame, resolution=9) -> pd.DataFrame:
    grid = region_bounds.h3.polyfill(resolution, explode=True)
    grid = grid.drop('geometry', axis=1)
    grid = grid.rename(columns={'h3_polyfill': 'h3'})
    # grid = grid.set_index('h3')
    # Convert H3 indexes to their geometric boundaries
    grid['geometry'] = grid.h3.h3_to_geo_boundary()
    return grid

pp_cs_gdf = gpd.read_file('sample_error.geojson')
grid_9 = generate_grid(pp_cs_gdf, 9)

Error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/h3pandas/util/decorator.py:27, in catch_invalid_h3_address.<locals>.safe_f(*args, **kwargs)
     26 try:
---> 27     return f(*args, **kwargs)
     28 except (TypeError, ValueError, H3CellError) as e:

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/h3/api/_api_template.py:294, in _API_FUNCTIONS.h3_to_geo_boundary(self, h, geo_json)
    277 """
    278 Return tuple of lat/lng pairs describing the cell boundary.
    279 
   (...)
    292 tuple of (float, float) tuples
    293 """
--> 294 return _cy.cell_boundary(self._in_scalar(h), geo_json)

TypeError: Argument 'h' has incorrect type (expected str, got int)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[120], line 13
     10     grid['geometry'] = grid.h3.h3_to_geo_boundary()
     11     return grid
---> 13 grid_9 = generate_grid(pp_cs_gdf, 9)

Cell In[120], line 10, in generate_grid(region_bounds, resolution)
      7 grid = grid.rename(columns={'h3_polyfill': 'h3'})
      8 # grid = grid.set_index('h3')
      9 # Convert H3 indexes to their geometric boundaries
---> 10 grid['geometry'] = grid.h3.h3_to_geo_boundary()
     11 return grid

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/h3pandas/h3pandas.py:160, in H3Accessor.h3_to_geo_boundary(self)
    139 def h3_to_geo_boundary(self) -> GeoDataFrame:
    140     """Add `geometry` with H3 hexagons to the DataFrame. Assumes H3 index.
    141 
    142     Returns
   (...)
    158     881e2659c3fffff    1  POLYGON ((14.99201 51.00565, 14.98973 51.00133...
    159     """
--> 160     return self._apply_index_assign(
    161         wrapped_partial(h3.h3_to_geo_boundary, geo_json=True),
    162         "geometry",
    163         lambda x: shapely.geometry.Polygon(x),
    164         lambda x: gpd.GeoDataFrame(x, crs="epsg:4326"),
    165     )

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/h3pandas/h3pandas.py:836, in H3Accessor._apply_index_assign(self, func, column_name, processor, finalizer)
    817 """Helper method. Applies `func` to index and assigns the result to `column`.
    818 
    819 Parameters
   (...)
    833 If using `finalizer`, can return anything the `finalizer` returns.
    834 """
    835 func = catch_invalid_h3_address(func)
--> 836 result = [processor(func(h3address)) for h3address in self._df.index]
    837 assign_args = {column_name: result}
    838 return finalizer(self._df.assign(**assign_args))

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/h3pandas/h3pandas.py:836, in <listcomp>(.0)
    817 """Helper method. Applies `func` to index and assigns the result to `column`.
    818 
    819 Parameters
   (...)
    833 If using `finalizer`, can return anything the `finalizer` returns.
    834 """
    835 func = catch_invalid_h3_address(func)
--> 836 result = [processor(func(h3address)) for h3address in self._df.index]
    837 assign_args = {column_name: result}
    838 return finalizer(self._df.assign(**assign_args))

File /anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/h3pandas/util/decorator.py:32, in catch_invalid_h3_address.<locals>.safe_f(*args, **kwargs)
     30 message += f"\nCaller: {f.__name__}({_print_signature(*args, **kwargs)})"
     31 message += f"\nOriginal error: {repr(e)}"
---> 32 raise ValueError(message)

ValueError: H3 method raised an error. Is the H3 address correct?
Caller: h3_to_geo_boundary(17894)
Original error: TypeError("Argument 'h' has incorrect type (expected str, got int)")

Reproduce

  • h3==3.7.7
  • h3pandas===0.2.6
  • I attach a sample of the dataframe I'm using:
    sample_error.zip

h3_to_parent at resolution 0 labels it as direct parent

h3_to_parent at resolution 0 calculates correctly, but returns a column called h3_parent, rather than h3_00 (or perhaps h3_0?) as expected based on the pattern for other resolutions.

I don't think there's anything unexpected about calculating the 0-level parent (I use it for partitioning; a higher resolution is used for my actual information). Rather it's just that this line checks for implicit False, not a strict check for None.

column = self._format_resolution(resolution) if resolution else "h3_parent"

Should be:

column = self._format_resolution(resolution) if resolution != None else "h3_parent"

>>> 'a' if 0 != None else 'b'
'a'
>>> 'a' if 0 else 'b'
'b'

The actual result is correct, it is a level-0 cell.

ENH: Implement a "finer `polyfill`"

A common use-case is to generate all hexagons that intersect with a given polygon. That's not straightforward to do using H3.

One approximate solution is to first generate a finer resolution grid using polyfill and then return to the desired resolution using h3_to_parent.

The H3-Pandas API could thus contain a convenience function that performs both of these operations in a single pass.

A quick demonstration of the idea:

import geopandas as gpd
import h3pandas

gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Resample to H3 cells
gdf = gdf.h3.polyfill_resample(4)
gdf = gdf.h3.h3_to_parent_aggregate(2)

Sidenote

The above code yields a "strange" result:

image

This is because a number of the hexagons cross the anti-meridian, and get rendered wrongly. Since this case might be reasonably common, H3-Pandas could perhaps provide a fix, something like

import numpy as np
from shapely.geometry import Polygon

def fix(cell):
    cell_coords = np.array(cell.boundary.coords)
    lngs = cell_coords[:, 0]
    if not (np.any(lngs < 0) and np.any(lngs > 0)) or not (np.any(np.abs(lngs) > 170)):
        return cell
    
    negative = lngs < 0
    lngs[negative] = lngs[negative] + 360
    return Polygon(cell_coords)

Applying the fix to the dataframe's geometries

gdf['geometry'] = gdf['geometry'].apply(fix)

fixes the visual output

image

GeoDataFrame has no attribute 'h3' only when multiprocessing using JobLib

I'm using JobLib to try and multiprocess a polyfill operation on a large dataset of polygons.

This line runs perfectly fine:
gdf.h3.polyfill(11)

But when I split the gdf into chunks and run with joblib, I get an error:

def polyfill_parallel(i, gdf_chunk):
gdf_chunk = gpd.GeoDataFrame(gdf_chunk)

#perform polyfill on the chunk
return gdf_chunk.h3.polyfill(11)

Parallel(n_jobs=-1, verbose=1)(delayed(polyfill_parallel)(i,gdf_chunk) for i, gdf_chunk in gdf_chunks))

I get the error:
AttributeError: 'GeoDataFrame' object has no attribute 'h3'

I tried the chunking method because I was initially looping over the rows of the main gdf, but that was passing GeoSeries to the function, which I thought was the cause of the error, but looks like it wasn't.

geopandas integration, support arbitrary geometry column name

Awesome project, makes life much easier.

It seems like this could use a tighter geopandas integration.
I noticed that this assumes the geometry column is named "geometry" which is not always the case.

You can get the active geometry column name with gdf.geometry.name where gdf.geometry attribute is always the active geometry. This works in geopandas 0.14.2.

I peaked at the source code, it seems like sometimes the gdf.geometry attribute is used in the source, but maybe it just looks for a column named "geometry" because it assumes it's a pandas dataframe. I'm not sure

Reproduction steps

I noticed this when running code something like this:

gdf = gdf.set_geometry("points_geometry")  # The only geometry column, contains several points
h3.geo_to_h3_aggregate(7, return_geometry=True)

I got this error:

TypeError: 'GeometryArray' with dtype geometry does not support reduction 'sum'

The above exception was the direct cause of the following exception:

TypeError: agg function failed [how->sum,dtype->geometry]

If I rename and drop the original geometry, it works

gdf["geometry"] = gdf.points_geometry
gdf = gdf.set_geometry("geometry").drop(columns="points_geometry")
   
h3agg = gdf.h3.geo_to_h3_aggregate(7, return_geometry=True)

ENH: Support other H3 APIs

h3-py has multiple index APIs. Currently, H3-Pandas is based on the basic_str API.
For performance, it would make the most sense to work with the numpy_int or memview_int APIs.

I see two options:

Method 1: Provide the user with an option

The user could then choose which API they want, similar to how h3-py does it. They could be informed of the potential speedups with the integer representations.

Method 2: Work always with integers, but show string representations

Most H3 users are arguably familiar with the H3 string representation. A possibility, originally suggested by @ajfriend, might be to utilize Pandas' extension types to provide a class that uses the int representation under the hood, but has a str representation (repr). This would allow the user to stay with the familiar string representation, but use all the performance improvements stemming from the int representation.

I do not know if this is possible in such a way as to still be able to leverage the speedups. This should be investigated.

13 and higher H3-resolution

Hi!

I was testing different H3-resolutions for my area of study and when I tried with 13 and 15 h3-resolutions and it didn't print out very well, I got blank spaces in between.

My code is:

#Openning a 10m regular sample points
sample_points10m_sr=gpd.read_file(output_directory+"sample_points10m_sr.geojson").to_crs(4326)

#Getting its lat and longitude
centroides["Lng"]=centroides.geometry.x
centroides["Lat"]=centroides.geometry.y

#Genereting the h3 index with 13-h3 resolution
centroides=centroides.h3.geo_to_h3(13)

#Reseting the index
centroides=centroides.reset_index()

#Selecting columns of my interest and grouping by

centroides=centroides.loc[:,["h3_13","id","Lng","Lat"]]
centroides = centroides.drop(columns=['Lng', 'Lat']).groupby('h3_13').sum()

#Saving it as GeoJSON
centroides.to_file(output_directory+"h3_13.geojson")

For example, when I opened it in QGIS it looks like this for a 13-h3 resolution:

Captura de pantalla 2024-02-01 133633

But When I did the same process for 12 and 10 resolution is all right, for example, here in the same area a picture of 12 resolution:

Captura de pantalla 2024-02-01 133642

For what reason is it happening? Is it about my computer or about the h3-pandas library?
Let me know.

Thanks you so much in advance for the H3-Pandas library, I am really enjoying it.
Bryan

handle nulls in `k_ring_smoothing`

If you apply k_ring_smoothing on a dataframe that contains nulls (created by e.g. geo_to_h3_aggregate), then the nulls are mapped to 0. This is probably unwanted behaviour.

Could nulls be handled in this function by doing something like this?

  • returning an aggregated value where the k-ring is completely non-null (current method)
  • returning an aggregated value where there are a mix of nulls and non-nulls in the k-ring
  • returning null where all k-ring vals are null

ENH: Support LineStrings

A great feature of the hexagonal shape of H3 is how naturally it can delineate LineStrings such as roads, rivers in a connected and visually pleasing way.

It would thus make sense to provide a method that would provide a "linestring polyfill", generating a continuous string of H3 cells that cover the linestring. This could be done in two steps:

  • Index the coordinates of the linestring using geo_to_h3
  • Fill-in the gaps using h3_line

The interface and implementation could be similar to other similar methods, e.g. polyfill

ENH: `polyfill_resample` distributes numeric values appropriately

Currently, polyfill_resample simple leaves any numeric values untouched.

This makes sense for "relative" values, e.g. a percentage of unemployed, but makes less sense for absolute values, e.g. total population.

For absolute values, it would make sense to simply divide the amount by the number of generated H3 cells. If the polygon's population is 100 and it generated 20 H3 cells, then I'd expect each cell to be assigned 5 people.

There are other advanced strategies, but those are, in my opinion, are best left to packages that focus on spatial resampling, such as Tobler.

The challenge will be to provide this functionality in a clean and configurable way.

Consider integrating different geospatial indexing systems

Super nice work!

I was wondering if the project can be expanded by supporting different geospatial indexing systems such as S2 and geotiles (bing).

I developed a common API to use them that can help support all those indexing systems.

The API more or less follows the H3 python API, so it shouldn't be that complicated to add support to other indexes.

If you fell like it is a good idea, I can open a PR with an experiment.

Q: Multicore support?

When performing polyfill operations, does H3-Pandas currently utilize multiple cores?

how do the weights for k_ring_smoothing work

I tried to understand the source code but struggle sorry. Let use say my (geo)pandas df has 2 stats: S1 + S2 does the application of the weights depend on the order of these stats - so let us say the weights are [0.3, 0.7] where 0.3 is applied to S1 and 0.7 to S2? Ultimately, the weights apply a linear combination (?) to the stats in the vicinity (defined by k) of the hexagon. Is this correct?

support h3 v4.0.0

The upcoming h3-py v4.0.0 release is making some breaking changes to the api.

Currently, if the h3 v4.0 prerelease beta is installed with pip install 'h3==4.0.0b2', it fails to integrate with h3pandas:

First issue: h3.h3 has been removed:

venv/lib/python3.10/site-packages/h3pandas/h3pandas.py:15: in <module>
    from h3 import h3
E   ImportError: cannot import name 'h3' from 'h3' (/home/worker/projects/feasibility_workflow/venv/lib/python3.10/site-packages/h3/__init__.py)

https://github.com/uber/h3-py/blob/f8958ac788ea04e2b383c8b859f75a05c3fcb815/src/h3/__init__.py#L6-L7

it looks like this first issue could be fixed by replacing from h3 import h3 with import h3.api.basic_str as h3 in h3pandas code and tests

ENH: Implement `h3_to_children`

H3-Pandas has no implementation of h3_to_children. It should be straightforward to implement it using essentially the same logic as other collection-creating methods, such as k_ring. Similarly to other methods, it should provide an explode option.

CRS Warning for k_ring_smoothing when using k param

What CRS should the GeoDataFrame being worked on have? I couldn't locate this in the docs or provided example notebooks.

When working with a GeoDataFrame with CRS 4326, a warning involving CRS is produced if k_ring_smoothing is applied using k instead of weight coefficients. The warning does not appear if weight coefficients are used.

test_smooth = test.h3.k_ring_smoothing(k=2)

FutureWarning: CRS mismatch between CRS of the passed geometries and 'crs'. Use 'GeoDataFrame.set_crs(crs, allow_override=True)' to overwrite CRS or 'GeoDataFrame.to_crs(crs)' to reproject geometries. CRS mismatch will raise an error in the future versions of GeoPandas.
lambda x: gpd.GeoDataFrame(x, crs="epsg:4326"),

test.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:

  • Lat[north]: Geodetic latitude (degree)
  • Lon[east]: Geodetic longitude (degree)
    Area of Use:
  • name: World.
  • bounds: (-180.0, -90.0, 180.0, 90.0)
    Datum: World Geodetic System 1984 ensemble
  • Ellipsoid: WGS 84
  • Prime Meridian: Greenwich

type(test)

geopandas.geodataframe.GeoDataFrame

test = test.h3.h3_is_valid()
test[test['h3_is_valid'] == False]

No rows show as invalid

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.