mierune / gtfs-parser Goto Github PK
View Code? Open in Web Editor NEWparse and aggregate GTFS
Home Page: https://pypi.org/project/gtfs-parser/
License: MIT License
parse and aggregate GTFS
Home Page: https://pypi.org/project/gtfs-parser/
License: MIT License
Large dataset for benchmark
https://opentransportdata.swiss/de/dataset/timetable-2021-gtfs2020
time python -m gtfs_parser parse gtfs_fp2021_2021-12-08_09-10.zip swiss
extracting zipfile...
GTFS loaded.
real 218m5.224s
user 56m11.213s
sys 0m11.147s
If an error occurs when reading a file other than the required table, the program terminates abnormally.
GTFS of Nanto city has a SJIS encoded file named "result.csv".
I consider that the result of the mail detoxification was archived incorrectly.
"gtfs_parser\gtfs_parser\gtfs.py", line 12, in load_df
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 0: invalid start byte
I propose that GTFSFactory
only read files that are actually used by this tool.
Currently, the GTFS specification has been extended and many files are defined that this tool does not use.
This will improve the performance.
In the gtfs.py file, in the function GTFS(gtfs_dir: list) -> dict, why gtfs_dir is list instead of str?
Trips with the same route_id may have different stopping patterns, such as round-trip or sectional trips.
However, read_routes()
returns only one LineString per route if shape is not used.
Because read_routes()
only refers to the first trip related to a route.
trip_id = route["trip_id"].unique()[0]
gtfs-parser/gtfs_parser/parse.py
Lines 93 to 111 in 479af5c
read_routes() should return MultiLineStrings for each route containing LineStrings for each stop pattern.
The performance should also be improved. #1
Processing of frequency aggregation is slow, taking 10 seconds for sample data containing 1580 trips.
This is about 20 times longer than the time to read stops and routes (0.5s).
In frequency aggregation, the processing time of the stop aggregation accounts for 94% of the total.
Most of the time is spent on __get_similar_stop_tuple()
.
The cause is that __get_similar_stop_tuple()
is called for the number of stops by map()
for the stops data frame.
__get_similar_stop_tuple()
is slow because it searches and sorts all of the stops on each call.
Sun Apr 21 02:20:36 2024 chitetsu.prof
18400825 function calls (17686952 primitive calls) in 17.311 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
720/1 0.010 0.000 17.334 17.334 {built-in method builtins.exec}
1 0.000 0.000 17.334 17.334 test_vary_gtfs.py:1(<module>)
1 0.001 0.001 15.349 15.349 test_vary_gtfs.py:82(main)
1 0.004 0.004 15.349 15.349 test_vary_gtfs.py:50(exec_test)
1 0.001 0.001 14.435 14.435 aggregate.py:12(__init__)
1 0.003 0.003 14.434 14.434 aggregate.py:34(__aggregate_similar_stops)
7 0.017 0.002 14.391 2.056 {pandas._libs.lib.map_infer}
6 0.000 0.000 14.127 2.354 series.py:3908(map)
6 0.000 0.000 14.124 2.354 base.py:1078(_map_values)
1226 0.071 0.000 13.863 0.011 aggregate.py:91(<lambda>)
1226 0.082 0.000 13.792 0.011 aggregate.py:134(__get_similar_stop_tuple)
1228 0.017 0.000 6.339 0.005 frame.py:3197(query)
1228 0.019 0.000 5.668 0.005 frame.py:3359(eval)
9877 0.071 0.000 4.191 0.000 frame.py:2869(__getitem__)
1228 0.021 0.000 3.950 0.003 eval.py:161(eval)
7376 0.076 0.000 2.737 0.000 managers.py:1436(take)
27257/27244 0.209 0.000 2.714 0.000 series.py:201(__init__)
25 0.000 0.000 2.666 0.107 __init__.py:1(<module>)
6149 0.019 0.000 2.654 0.000 generic.py:3355(_take_with_is_copy)
9820 0.027 0.000 2.383 0.000 common.py:50(new_method)
I consider that the process will be much faster if the process is done for the entire stops data frame at once.
GTFS: feed_chitetsu_chitetsubus_20240326_191913.zip
Results of cProfile: chitetsu_prof.txt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.