Comments (4)
Hey @Kanahiro ,
I think the read_routes function can be radically improved in terms of speed and performance.
I noticed the current implementation for generating GeoJSON features from the merged DataFrame iterates over each unique route_id, which can be quite slow, especially for large DataFrames. This is due to the repetitive filtering and sorting operations within a loop.
To enhance performance, I suggest leveraging pandas' groupby and apply methods. This approach efficiently groups the DataFrame by both route_id and trip_id and then applies a function to each group to construct the GeoJSON feature. This method minimizes the repetitive operations and leverages pandas' optimized group processing, which can significantly improve the execution time.
Locally, I replaced this code:
# parse routes
for route_id in merged["route_id"].unique():
route = merged[merged["route_id"] == route_id]
trip_id = route["trip_id"].unique()[0]
route = route[route["trip_id"] == trip_id].sort_values("stop_sequence")
features.append(
{
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": route[
["stop_lon", "stop_lat"]
].values.tolist(),
},
"properties": {
"route_id": str(route_id),
"route_name": route.route_concat_name.values.tolist()[0],
},
}
)
with this:
def create_feature(group):
# Assuming the group is already sorted by stop_sequence, if not, uncomment the next line
# group = group.sort_values("stop_sequence")
route_id = group["route_id"].iloc[0]
route_name = group["route_concat_name"].iloc[0]
feature = {
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": group[["stop_lon", "stop_lat"]].values.tolist(),
},
"properties": {
"route_id": str(route_id),
"route_name": route_name,
},
}
return feature
# Ensure the DataFrame is sorted by stop_sequence before applying the function
merged_sorted = merged.sort_values(["route_id", "trip_id", "stop_sequence"])
# Apply the function to each group of route_id and trip_id
features = merged_sorted.groupby(["route_id", "trip_id"]).apply(create_feature).tolist()
from gtfs-parser.
x500 faster...
time poetry run python -m gtfs_parser aggregate /Users/kanahiro/Downloads/GTFS
_FP2021_2021-12-08_09-10 output
GTFS loaded.
real 0m36.841s
user 0m32.781s
sys 0m3.893s
close this :)
from gtfs-parser.
Hi, @liorsteinberg sorry for late response.
newest version of gtfs-parser
is dramatically improved, please try it if you have interest still :)
from gtfs-parser.
Amazing work! Thanks
from gtfs-parser.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gtfs-parser.