Comments (4)
NumPy approach would be likely significantly faster, but! it needs topologically correct input. By that I mean that you need shared vertices and edges to do that. Shared paths do not need any of it. Consider following example:
from shapely.geometry import LineString
from shapely.ops import shared_paths
import geopandas as gpd
box1 = LineString([(0,0), (0,1), (1, 1), (1, 0), (0,0)])
box2 = LineString([(-1,0), (2,0), (2, -2), (-1, -2), (-1,0)])
gpd.GeoSeries([box1, box2]).plot(color=['r', 'b'], alpha=.5)
shared_paths(box1, box2).wkt
'GEOMETRYCOLLECTION (MULTILINESTRING EMPTY, MULTILINESTRING ((1 0, 0 0)))'
These two line strings are topological neighbours, so they should be recognised by TopoJSON as such, but they do not share even a single vertex.
Exactly the same issue is in libpysal
library during the creation of spatial weights matrix. They offer both options - Queen or Rook weights based purely on shared vertices/edges, which is the recommended performant option. Then they also have Fuzzy contiguity function based on shapely's intersects
.
It sounds like a good model for TopoJSON as well. Defaulting to fast numpy algorithm, but optionally offer shared_paths.
from topojson.
Yes, I'm aware of that. Forgot to include this info in the issue. I've considered a numpy route before and eventually sticked to the shapely shared_paths strategy because of the situation you mentioned.
I like your suggestion to include both a numpy and shapely route. Still not completely sure about performance of a numpy-based function though. Opened a PR for testing: #75.
I also don't really need the shared paths. I need the two coordinates that represent both of the segment that is shared. Given that info, maybe someone can think of yet another strategy
from topojson.
Hi, I did some tests with #76 and it seems that Join
is no longer the bottleneck. I have used fairly complex (but still small) GeoDataFrame with ~4000 dense polygons with following timings:
topojson.Topology(gdf, shared_paths='dict')
extract 4.612934827804565
join 6.585726261138916
cut 16.98685908317566
dedup 14.114187955856323
hash 35.22837710380554
total 77.5300726890564
topo = topojson.Topology(gdf, shared_paths='shapely')
extract 4.71062707901001
join 97.8879702091217
cut 24.696655988693237
dedup 14.725230693817139
hash 47.4505250453949
total 189.47293210029602
I had to stop numpy
version as it took ages. If you want to try the data, they're here for some time.
from topojson.
I've released version 1.0rc10
, which results in the following speedups for me:
release | naturalearth_lowres | nybb | sample.geojson |
---|---|---|---|
1.0rc10 (15 June 2020) | 8X (214 ms) | 20X (514 ms) | 20X (12.9 s) |
I'll close this issue for now.
from topojson.
Related Issues (20)
- Topology modifies source data
- Topojson bbox should not be transformed when loading Topojson-dict HOT 1
- toposimplify wrongly applied on Topojson data from file
- Reduce decimal places when converting to GeoJSON HOT 2
- Shapely deprecation warnings in topojson 1.3 HOT 5
- Keep geojson properties HOT 6
- Merge multiple layers in a single topojson HOT 5
- Conversion to Typology object causes overlaps HOT 5
- tp.Topology.to_json(pretty=True) doesn't handle None correctly. (Doens't convert None to null) HOT 1
- Converting GeoJSON FeatureCollection to TopoJSON HOT 1
- BUG: `Topology.to_gdf` should keep the original index HOT 2
- holes in multipolygons are lost by simplification HOT 4
- Deprecation warning for shapely 2.0 HOT 3
- Wrong topologies/arcs being created? HOT 7
- Creating a topology for data without junctions and shared_coords=False, prequantize=False gives error
- Bug: polygons that entirely fill islands in another polygon are often not dedupped
- shared_coords=True vs shared_coords=False HOT 5
- Linestrings that follow the same path but where one contains extra redundant points are not deduplicated
- enh: include features that are possible with shapely 2.0
- tests failing, natural earth dataset changed HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from topojson.