Giter Site home page Giter Site logo

Comments (11)

ricschuster avatar ricschuster commented on May 21, 2024

Hi Jeff,

This is an important point and outputting multiple solutions is certainly desirable. I personally prefer 1, even though its computationally intensive, because the way we have generated multiple solutions for far was to vary targets or constraints to create a pool of options and solutions, before looking at the overlap between them.

The rational behind this is that we assume that the ILP solver finds the optimal solution to a problem, so there is little reason to create a solution pool from the same problem, as the solutions in the pool will most likely be very similar. This might be a misunderstanding on my part, but that's my assumption.

The way that I have approached 1, at least via Shiny in the past was to allow users to either enter parameters on targets and/or constraints manually or upload a csv file where the scenarios are specified. Example here: http://forbasin.forestry.ubc.ca/CDFCP.v0.22.3/
(I think currently the upload doesn't work, so you will have to create examples manually)

This is also the way I was planning to implement this with the prshiny function. I think for practitioners this way would make the most sense, but I haven't explored 2 or 3, so am not sure if that could be feasible as well.

from prioritizr.

jeffreyhanson avatar jeffreyhanson commented on May 21, 2024

Hmm, you raise some really good points. I think the best strategy probably depends on why the user wants multiple solutions.

If the user wants the top n solutions nearest to optimality and views optimality is being really important, then option 1 is most useful for them. However, for medium sized problems the top 100 solutions will probably be very similar with only a few planning units that are different between them, so I don't know how useful this will be in practice?

My impression is that many users would like a portfolio of solutions that are very different in terms of the planning units selected, but still within some specified level of optimality. It seems to me that a lot of conservation planning isn't actually done with "real" cost data---though I don't have any experience working in real-world conservation planning --so optimality isn't really considered too important so long as the solutions aren't really poor.

What do you think? I could try out 3 or 3a with some simulated data and see how it goes.

from prioritizr.

ricschuster avatar ricschuster commented on May 21, 2024

This is a tricky one. Maybe we should ask the Marxan mailing list for input on this?

To me the main difference between 1 and 2/3 is, that in the case of one the scenario generation is up to the user, in 2/3 its more similar to Marxan in the sense that the user specifies one problem that is tried to be solved a number of ways by the solver.

The main issue I have with 2/3 is that we would have a way to get to the optimal solution with ILP, but we deliberately choose to forgo reaching an optimal solution for the sake of generating a portfolio of solutions. To me that seems a bit strange, but I'm not sure how others think about this. I also wonder how different solutions would be if we say choose a pool of solutions that have a 5% gap. That gap might be acceptable for most (could also let the user specify the gap), but we would need to test this to see how much variation we actually get in the pu selections.

It also sounds like that for both 2 and 3 we would have issues using ROI, which is a shame, because it sounds like that ROI would be good to implement moving forward.

Maybe try 3 or 3a, see how it goes, and also ask the mailing list?
1 is computationally intensive, but would require little effort in terms of code.

from prioritizr.

jeffreyhanson avatar jeffreyhanson commented on May 21, 2024

Yeah, asking the Marxan emailing list sounds like a great idea.

I think 1 & 3 are equally computationally intensive. They both involve resolving the entire problem n times. Both 1 and 3 are compatible with ROI. It's just 3a that could potentially offer computational shortcuts and break compatibility with ROI.

from prioritizr.

jeffreyhanson avatar jeffreyhanson commented on May 21, 2024

I've implemented option 3 in the multiple_solutions_3 branch. I've been exploring it's functionality, and have put together a minimal rmarkdown doc (ms.zip). I'd be really keen to hear what you think.

I also realised that option 3 could potentially be much faster than option 1 because each solution can be solved in parallel, which is useful because Gurobi spends most of its time in the presolve stage for simple problems (eg. classic marxan problems) does not use multiple threads (even if specified in to use multiple threads).

If you think this is a viable option for generating multiple solutions, I would be interested to hear your thoughts on implementing it. For example, should users generate solutions by running solve multiple times? Or should there be a number_solutions parameter in add_gurobi_solver? How would you see this interfacing with #22 and #25?

from prioritizr.

ricschuster avatar ricschuster commented on May 21, 2024

This is awesome Jeff! I've played around the the markdown doc you put together a bit and I like the way you set this up. I think its very easy to understand for prioritizr users to use this. I have added a few things to the markdown doc myself (MS_3.zip).

The first thing I did was to quickly put together a figure showing the pu selection frequency of the 1000 runs you had in the example. There are some options in the solution space for sure, which is I think a good starting point. I did also go ahead and reduced the gap to 5%, just to see how that would play out. Pretty decent I think.

One thing to note is that there is a lot of repeats in the solution space at the moment (at least for 5%). The question is, if those are true repeats or if the selection pattern just happens to be the same. Do you think there is a difference between those two or does it really mean that each time the exact same pu's are selected the solution is the same? I'm just wondering if maybe getting to one solution could be done via different routes and if that would still mean the solutions are the same or just happen to select the same pu's? If they are really the same, do you think there is a way to reduce computational load by removing duplicate setups before they are actually solved?

The speedup by solving in parallel sounds very promising to me.

I haven't tried this with one of my bigger datasets yet, but if you think the parallel processing could be implemented quickly, it would be great to give that a try with a bigger dataset.

I think a number_solutions parameter in add_solver would make the most sense. This way the user can use one function to specify the gap and the number of solutions they want to get out.

#22: I wonder if we should recreate the output structure from Marxan, where we export each solution and the selection frequency? This might get a bit crazy for bigger problems though. An alternative would be to have the selection frequency as a spatial output if inputs are spatial and the solution matrix as a csv. To link to the raster we could also create an index raster that gets id values which are used as the first column in the solution matrix csv file. In some ways I'm thinking the Marxan output is a bit outdated, but if we mirror Marxan's output structure here, users would have an easier time switching?

#25: A list of attributes for the solution matrix? I guess if we already have a solution matrix as output (I don't think a solution raster stack makes the most sense here), we could also create an output class with slots for 1) spatial representation, 2) solution matrix, 3...n) attributes from #25. What do you think?

I do like this solution space generating a lot, we just have to make sure this is sensible and makes computational sense. Its also great in the sense that one can get the optimal solution fairly quickly and can then relax the gap. This is quite different from Marxan and a huge improvement I think.

from prioritizr.

jeffreyhanson avatar jeffreyhanson commented on May 21, 2024

Yeah, the random shuffle method (3) does return a lot of duplicate solutions. I guess there's no real way to know how well the (3) is sampling the range of solutions within the gap without actually implementing Benders cuts (1) to determine the number of solutions within the gap. In other words, 300 unique solutions might sound good, but it's hard to know if this is 40 % or 100 % of the solutions within the 5 % gap.

Yeah, it's entirely possible that the duplicates could be returned via different algorithms/methods. But I don't think this matters too much. If Gurobi returns two identical solutions, they're still identical regards of whether it used algorithm A or algorithm B, so I would still remove all duplicates when calculating things like selection frequency. Yeah we might be able to re-use variable states to reduce incidences of duplicates, but then that means that each Gurobi run is dependent on previous runs, so this would make things tricky when running multiple Gurobi runs in parallel to generate the portfolio.

Yeah I think adding it to the add_*_solver makes the most sense. If that's the case, maybe we could add options (1) and (3) and use an additional argument to specify the method for generating multiple solution? So the function definition might look something like this?

add_gurobi_solver <- function(x, gap=0.1, time_limit=.Machine$integer.max,
                                                presolve=2, threads=1, first_feasible=0, number_solutions = 1L,
                                                multiple_solution_method = c("shuffle", "cuts"), 
                                                remove_duplicate_solutions = TRUE)) {....}

Re #22: Yeah I think the Marxan output probably getting a bit dated, but I think it's really important to make this identical to Marxan output. I think we risk annoying a lot of users if the outputs were different in any way.

Re #25: Yeah we could do that. Although I wonder if we could store additional solutions in extra columns in the data.frame or Spatial*DataFrame or as extra layers in a RasterStack. This would make it much easier for the users to access multiple solutions, than if we kept the extra solutions in an attribute. If we did that, then I think we would only need to store the properties for each solution (as a separate vectors?) in the attributes: objective values, solver return codes, run time, solution method?

from prioritizr.

jeffreyhanson avatar jeffreyhanson commented on May 21, 2024

I just had another thought. Instead of adding arguments to the solver functions, what if we added functions like add_cuts_portfolio(...) or add_shuffle_portfolio(...) that users could add to ConservationProblem class objects to specify how multiple solutions could be added?

This way users could generate multiple solutions using something like this:

  p <- problem(sim_pu_raster, sim_features) %>%
    add_min_set_objective() %>%
    add_absolute_targets(targ) %>%
    add_binary_decisions() %>%
    add_gurobi_solver(gap = 0.2) %>%
    add_cuts_portfolio(number_solutions = 10)

from prioritizr.

jeffreyhanson avatar jeffreyhanson commented on May 21, 2024

I think this has several benefits: it separates the portfolio code from the solver-specific code (which will be easier to maintain and make porting to ROI easier), it removes redundant combinations of arguments (eg. remove_duplicate_solutions and Bender's cuts), and also means that it will be easy to add new strategies for generating multiple solutions if we can think of any more. What do you think?

from prioritizr.

ricschuster avatar ricschuster commented on May 21, 2024

re #25: I agree, with your suggestion and might have been unclear before. It might be getting users into trouble for very large >1M pu's and >1000 features to have a raster stack comprised of 100+ rasters, but I would assume that users with that kind of size problem would know how to deal with problems like that.

I like the add_cuts_portfolio(...) suggestion as it does help to keep things cleaner and make adding new strategies easier.

from prioritizr.

jeffreyhanson avatar jeffreyhanson commented on May 21, 2024

Yeah, I agree.

Ok awesome. I've started implementing this along with #25, so hopefully this functionality will be in prioritizr soon.

from prioritizr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.