Giter Site home page Giter Site logo

Parallelize importer about abstreet HOT 7 CLOSED

a-b-street avatar a-b-street commented on May 3, 2024
Parallelize importer

from abstreet.

Comments (7)

RestitutorOrbis avatar RestitutorOrbis commented on May 3, 2024

This is the primary relevant file for this issue right?

abstreet/importer/src/main.rs 

Are there any others I should be mindful of as well? @dabreegster

from abstreet.

dabreegster avatar dabreegster commented on May 3, 2024

That's the main one. The other modules in importer are worth a look too, but hopefully they're pretty simple. I think the only easy opportunity for parallelization is the for name in names loop.

The only potential race conditions would be attempting to produce the same file at the same time. This could happen by two callers trying to do seattle::ensure_popdat_exists, since this first fully builds the huge_seattle map and then produces a file from it. I guess within each of the per-city modules, there's an input() function that downloads all input files needed -- we also don't want to duplicate that work at the same time. Ah, and the utility download always produces tmp_output before renaming; probably have to deal with that. Maybe this is a little more complex than I first estimated.

(By the way, if you know of a simple way to express a dependency graph of tasks and be idempotent / detect work that's already been done, there may be a much better way to express everything in importer.)

from abstreet.

RestitutorOrbis avatar RestitutorOrbis commented on May 3, 2024

Okay, so from looking at it seems like the only jobs that can be run in parallel within for name in names would be job.raw_to_map,job.scenario, and job.scenario_everyone. So, in line with that, I should only spawn threads for those jobs and leave job.osm_to_raw to run sequentially given those concerns you've expressed about the input() function. So, with that in mind, the solution I am considering is the following:

    let mut was_ensure_popdate_exists_called = false;
    for name in names {
        //Leave job.osm_to_raw alone
        if job.osm_to_raw {
            match job.city.as_ref() {
                "austin" => austin::osm_to_raw(&name),
                "los_angeles" => los_angeles::osm_to_raw(&name),
                "seattle" => seattle::osm_to_raw(&name),
                x => panic!("Unknown city {}", x),
            }
        }

        //Spawn a thread to manage this
        if job.raw_to_map {
            utils::raw_to_map(&name, job.use_fixes);
        }
 
        if job.scenario {
            assert_eq!(job.city, "seattle");
            //make sure ensure_popdat_exists is only called once
            if !was_ensure_popdat_exists{
                seattle::ensure_popdat_exists(job.use_fixes);
                was_ensure_popdat_exists = true;
            }

            //Spawn a thread to manage this
            let mut timer = abstutil::Timer::new(format!("Scenario for {}", name));
            let map = map_model::Map::new(abstutil::path_map(&name), job.use_fixes, &mut timer);
            soundcast::make_weekday_scenario(&map, &mut timer).save();
        }

        if job.scenario_everyone {
            assert_eq!(job.city, "seattle");
            if !was_ensure_popdat_exists_called{
                seattle::ensure_popdat_exists(job.use_fixes);
            }

            //Spawn a thread to manage this
            let mut timer = abstutil::Timer::new(format!("Scenario for {}", name));
            let map = map_model::Map::new(abstutil::path_map(&name), job.use_fixes, &mut timer);
            soundcast::make_weekday_scenario_with_everyone(&map, &mut timer).save();
        }
    }
    //Wait for all threads to complete

from abstreet.

dabreegster avatar dabreegster commented on May 3, 2024

This should almost work. The only problem is that ensure_popdat_exists may call raw_to_map, so there may be a case when two threads are both doing raw_to_map(huge_seattle) at the same time. That's technically correct, provided writing the output file is atomic.

I haven't done much parallelism in Rust before, but it could be nice to use the type system to statically enforce correctness. What if we had some marker type that does not implement Sync to prevent accidentally calling osm_to_raw concurrently?

Also, the generic dependency graph executor I was thinking of is https://github.com/salsa-rs/salsa. Probably overkill for this tool, though.

from abstreet.

RestitutorOrbis avatar RestitutorOrbis commented on May 3, 2024

Agreed that the dependency graph executor is probably a bad move. Plus, it seems unstable and I imagine adding unstable dependencies wouldn't be healthy for this project. I'll take a stab at a marker type but other than that I'll look to execute the plan from my earlier comment.

from abstreet.

RestitutorOrbis avatar RestitutorOrbis commented on May 3, 2024

Maybe I'm missing something but I keep getting the following compile error on the latest revision of the repo @dabreegster :

error[E0599]: no associated item named `MAX` found for type `f64` in the current scope
  --> geom/src/bounds.rs:16:25
   |
16 |             min_x: f64::MAX,
   |                         ^^^ associated item not found in `f64`
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
           `use rand::distributions::weighted::alias_method::Weight;`
help: you are looking for the module in `std`, not the primitive type
   |
16 |             min_x: std::f64::MAX,
   |                    ^^^^^^^^^^^^^

error[E0599]: no associated item named `MAX` found for type `f64` in the current scope
  --> geom/src/bounds.rs:17:25
   |
17 |             min_y: f64::MAX,
   |                         ^^^ associated item not found in `f64`
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
           `use rand::distributions::weighted::alias_method::Weight;`
help: you are looking for the module in `std`, not the primitive type
   |
17 |             min_y: std::f64::MAX,
   |                    ^^^^^^^^^^^^^

error[E0599]: no associated item named `MIN` found for type `f64` in the current scope
  --> geom/src/bounds.rs:18:25
   |
18 |             max_x: f64::MIN,
   |                         ^^^ associated item not found in `f64`
   |
help: you are looking for the module in `std`, not the primitive type
   |
18 |             max_x: std::f64::MIN,
   |                    ^^^^^^^^^^^^^

error[E0599]: no associated item named `MIN` found for type `f64` in the current scope
  --> geom/src/bounds.rs:19:25
   |
19 |             max_y: f64::MIN,
   |                         ^^^ associated item not found in `f64`
   |
help: you are looking for the module in `std`, not the primitive type
   |
19 |             max_y: std::f64::MIN,
   |                    ^^^^^^^^^^^^^

error[E0599]: no associated item named `MAX` found for type `f64` in the current scope
  --> geom/src/bounds.rs:96:27
   |
96 |             min_lon: f64::MAX,
   |                           ^^^ associated item not found in `f64`
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
           `use rand::distributions::weighted::alias_method::Weight;`
help: you are looking for the module in `std`, not the primitive type
   |
96 |             min_lon: std::f64::MAX,
   |                      ^^^^^^^^^^^^^

error[E0599]: no associated item named `MAX` found for type `f64` in the current scope
  --> geom/src/bounds.rs:97:27
   |
97 |             min_lat: f64::MAX,
   |                           ^^^ associated item not found in `f64`
   |
   = help: items from traits can only be used if the trait is in scope
   = note: the following trait is implemented but not in scope; perhaps add a `use` for it:
           `use rand::distributions::weighted::alias_method::Weight;`
help: you are looking for the module in `std`, not the primitive type
   |
97 |             min_lat: std::f64::MAX,
   |                      ^^^^^^^^^^^^^

error[E0599]: no associated item named `MIN` found for type `f64` in the current scope
  --> geom/src/bounds.rs:98:27
   |
98 |             max_lon: f64::MIN,
   |                           ^^^ associated item not found in `f64`
   |
help: you are looking for the module in `std`, not the primitive type
   |
98 |             max_lon: std::f64::MIN,
   |                      ^^^^^^^^^^^^^

error[E0599]: no associated item named `MIN` found for type `f64` in the current scope
  --> geom/src/bounds.rs:99:27
   |
99 |             max_lat: f64::MIN,
   |                           ^^^ associated item not found in `f64`
   |
help: you are looking for the module in `std`, not the primitive type
   |
99 |             max_lat: std::f64::MIN,
   |                      ^^^^^^^^^^^^^

error: aborting due to 8 previous errors

For more information about this error, try `rustc --explain E0599`.
error: could not compile `geom`.

from abstreet.

dabreegster avatar dabreegster commented on May 3, 2024

I upgraded to Rust 1.43 on April 23, which made those imports unnecessary. Try rustup update stable

from abstreet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.