Giter Site home page Giter Site logo

gtfs-parser's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

takohei

gtfs-parser's Issues

Skip unused files to avoid errors

If an error occurs when reading a file other than the required table, the program terminates abnormally.

Example

GTFS of Nanto city has a SJIS encoded file named "result.csv".
I consider that the result of the mail detoxification was archived incorrectly.

"gtfs_parser\gtfs_parser\gtfs.py", line 12, in load_df
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 0: invalid start byte

Proposal

I propose that GTFSFactory only read files that are actually used by this tool.

  • feed_info, agency, routes, stops, trips, stop_times, shapes, calendar, calendar_dates

Currently, the GTFS specification has been extended and many files are defined that this tool does not use.
This will improve the performance.

gtfs_dir type

In the gtfs.py file, in the function GTFS(gtfs_dir: list) -> dict, why gtfs_dir is list instead of str?

read_routes() should return MultiLineStrings even if shape is not used

Problem

Trips with the same route_id may have different stopping patterns, such as round-trip or sectional trips.
However, read_routes() returns only one LineString per route if shape is not used.

Cause

Because read_routes() only refers to the first trip related to a route.

trip_id = route["trip_id"].unique()[0]

for route_id in merged["route_id"].unique():
route = merged[merged["route_id"] == route_id]
trip_id = route["trip_id"].unique()[0]
route = route[route["trip_id"] == trip_id].sort_values("stop_sequence")
features.append(
{
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": route[
["stop_lon", "stop_lat"]
].values.tolist(),
},
"properties": {
"route_id": str(route_id),
"route_name": route.route_concat_name.values.tolist()[0],
},
}
)

Solution

read_routes() should return MultiLineStrings for each route containing LineStrings for each stop pattern.

Related Issue

The performance should also be improved. #1

Sample data

GTFS: Tokyo Toei Bus - ToeiBus-GTFS_20240421.zip

performance improve of stop unification for frequency aggregation

Problem

Processing of frequency aggregation is slow, taking 10 seconds for sample data containing 1580 trips.
This is about 20 times longer than the time to read stops and routes (0.5s).

Cause

In frequency aggregation, the processing time of the stop aggregation accounts for 94% of the total.
Most of the time is spent on __get_similar_stop_tuple().
The cause is that __get_similar_stop_tuple() is called for the number of stops by map() for the stops data frame.
__get_similar_stop_tuple() is slow because it searches and sorts all of the stops on each call.

Profiling results

Sun Apr 21 02:20:36 2024    chitetsu.prof
         18400825 function calls (17686952 primitive calls) in 17.311 seconds
   Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    720/1    0.010    0.000   17.334   17.334 {built-in method builtins.exec}
        1    0.000    0.000   17.334   17.334 test_vary_gtfs.py:1(<module>)
        1    0.001    0.001   15.349   15.349 test_vary_gtfs.py:82(main)
        1    0.004    0.004   15.349   15.349 test_vary_gtfs.py:50(exec_test)
        1    0.001    0.001   14.435   14.435 aggregate.py:12(__init__)
        1    0.003    0.003   14.434   14.434 aggregate.py:34(__aggregate_similar_stops)
        7    0.017    0.002   14.391    2.056 {pandas._libs.lib.map_infer}
        6    0.000    0.000   14.127    2.354 series.py:3908(map)
        6    0.000    0.000   14.124    2.354 base.py:1078(_map_values)
     1226    0.071    0.000   13.863    0.011 aggregate.py:91(<lambda>)
     1226    0.082    0.000   13.792    0.011 aggregate.py:134(__get_similar_stop_tuple)
     1228    0.017    0.000    6.339    0.005 frame.py:3197(query)
     1228    0.019    0.000    5.668    0.005 frame.py:3359(eval)
     9877    0.071    0.000    4.191    0.000 frame.py:2869(__getitem__)
     1228    0.021    0.000    3.950    0.003 eval.py:161(eval)
     7376    0.076    0.000    2.737    0.000 managers.py:1436(take)
27257/27244    0.209    0.000    2.714    0.000 series.py:201(__init__)
       25    0.000    0.000    2.666    0.107 __init__.py:1(<module>)
     6149    0.019    0.000    2.654    0.000 generic.py:3355(_take_with_is_copy)
     9820    0.027    0.000    2.383    0.000 common.py:50(new_method)

Solution

I consider that the process will be much faster if the process is done for the entire stops data frame at once.

Sample data

GTFS: feed_chitetsu_chitetsubus_20240326_191913.zip
Results of cProfile: chitetsu_prof.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.