Giter Site home page Giter Site logo

Comments (2)

dhersz avatar dhersz commented on June 13, 2024

And that will probably speed read_gtfs() up significantly as well.

from gtfstools.

dhersz avatar dhersz commented on June 13, 2024

Fixed that in d920ba4.

Now time columns are read as strings andconverted to seconds after midnight (as int) when needed.

This significantly speed GTFS reading and writing.

Here are some timings:

microbenchmark::microbenchmark(
    gtfs <- read_gtfs(data_path),
    gtfs_tidy <- tidytransit::read_gtfs(data_path),
    gtfs_2gps <- gtfs2gps::read_gtfs(data_path),
    times = 5L
)
#> Unit: milliseconds
#>                                           expr       min        lq      mean    median        uq       max neval
#>                   gtfs <- read_gtfs(data_path)  747.7385  787.6492  798.1753  807.7225  813.6713  834.0952     5
#> gtfs_tidy <- tidytransit::read_gtfs(data_path) 1680.2482 1913.6716 2174.9585 2017.0233 2036.0840 3227.7655     5
#>    gtfs_2gps <- gtfs2gps::read_gtfs(data_path) 1968.6259 1970.1558 2278.7080 1996.7162 2523.9983 2934.0436     5

tmp_dir <- tempdir()
tmp_file1 <- tempfile(pattern = "gtfs", tmpdir = tmp_dir, fileext = ".zip")
tmp_file2 <- tempfile(pattern = "gtfs", tmpdir = tmp_dir, fileext = ".zip")
tmp_file3 <- tempfile(pattern = "gtfs", tmpdir = tempdir(), fileext = ".zip")

microbenchmark::microbenchmark(
    write_gtfs(gtfs, tmp_file1),
    tidytransit::write_gtfs(gtfs_tidy, tmp_file2),
    gtfs2gps::write_gtfs(gtfs_2gps, tmp_file3),
    times = 5L
)
#> Unit: seconds
#>                                          expr      min       lq     mean  median       uq      max neval
#>                   write_gtfs(gtfs, tmp_file1) 2.672873 2.678202 3.043181 2.70489 2.801637 4.358303     5
#> tidytransit::write_gtfs(gtfs_tidy, tmp_file2) 3.763208 3.803348 3.882299 3.87548 3.910184 4.059277     5
#>    gtfs2gps::write_gtfs(gtfs_2gps, tmp_file3) 2.386184 2.435723 2.536897 2.44019 2.541006 2.881380     5

It may be useful to implement a function such as set_times_to_seconds(), similar to tidytransit::set_hms_times(), in the future. Right now every function that requires time calculation (set_trip_speed(), get_trip_duration()) needs to convert times to seconds inside it. If a function of this sort exists then we just need to check for the existence of time as seconds columns and use it, if it exists. That'll probably save some time in big GTFS files.

from gtfstools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.