Giter Site home page Giter Site logo

performance improve about gtfs-parser HOT 4 CLOSED

Kanahiro avatar Kanahiro commented on September 4, 2024
performance improve

from gtfs-parser.

Comments (4)

liorsteinberg avatar liorsteinberg commented on September 4, 2024 1

Hey @Kanahiro ,

I think the read_routes function can be radically improved in terms of speed and performance.

I noticed the current implementation for generating GeoJSON features from the merged DataFrame iterates over each unique route_id, which can be quite slow, especially for large DataFrames. This is due to the repetitive filtering and sorting operations within a loop.

To enhance performance, I suggest leveraging pandas' groupby and apply methods. This approach efficiently groups the DataFrame by both route_id and trip_id and then applies a function to each group to construct the GeoJSON feature. This method minimizes the repetitive operations and leverages pandas' optimized group processing, which can significantly improve the execution time.

Locally, I replaced this code:

 # parse routes
        for route_id in merged["route_id"].unique():
            route = merged[merged["route_id"] == route_id]
            trip_id = route["trip_id"].unique()[0]
            route = route[route["trip_id"] == trip_id].sort_values("stop_sequence")
            features.append(
                {
                    "type": "Feature",
                    "geometry": {
                        "type": "LineString",
                        "coordinates": route[
                            ["stop_lon", "stop_lat"]
                        ].values.tolist(),
                    },
                    "properties": {
                        "route_id": str(route_id),
                        "route_name": route.route_concat_name.values.tolist()[0],
                    },
                }
            )

with this:

def create_feature(group):
    # Assuming the group is already sorted by stop_sequence, if not, uncomment the next line
    # group = group.sort_values("stop_sequence")
    route_id = group["route_id"].iloc[0]
    route_name = group["route_concat_name"].iloc[0]

    feature = {
        "type": "Feature",
        "geometry": {
            "type": "LineString",
            "coordinates": group[["stop_lon", "stop_lat"]].values.tolist(),
        },
        "properties": {
            "route_id": str(route_id),
            "route_name": route_name,
        },
    }

    return feature
    
# Ensure the DataFrame is sorted by stop_sequence before applying the function
merged_sorted = merged.sort_values(["route_id", "trip_id", "stop_sequence"])

# Apply the function to each group of route_id and trip_id
features = merged_sorted.groupby(["route_id", "trip_id"]).apply(create_feature).tolist()

from gtfs-parser.

Kanahiro avatar Kanahiro commented on September 4, 2024 1

x500 faster...

time poetry run python -m gtfs_parser aggregate /Users/kanahiro/Downloads/GTFS
_FP2021_2021-12-08_09-10 output
GTFS loaded.

real    0m36.841s
user    0m32.781s
sys     0m3.893s

close this :)

from gtfs-parser.

Kanahiro avatar Kanahiro commented on September 4, 2024

Hi, @liorsteinberg sorry for late response.
newest version of gtfs-parser is dramatically improved, please try it if you have interest still :)

from gtfs-parser.

liorsteinberg avatar liorsteinberg commented on September 4, 2024

Amazing work! Thanks

from gtfs-parser.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.