Giter Site home page Giter Site logo

Comments (13)

pearsonca avatar pearsonca commented on June 24, 2024 1

Re "antagonistic" - getting runs to produce the precise same results, alongside a fully interpretable audit trail, is going to cost (money, time, sanity, etc) that could be better spent on "useful" features (the phrase I would use instead of "correct", though also a bit vague) e.g. visualizations of results or open-source contributions to improve package diagnostics/messages/etc (hint-hint).

Making results precisely match is useful for algorithmic diagnosis - that can be good for figuring out what's wrong with your machinery, but also...that's not the point of the machine, right?

Aside: if results are sensitive to random number seed (or more broadly, generation), then they aren't results. Which is to say: the specific quantities whatever pipeline you build spits out aren't the result - "do X over Y" is the result.

from epinowcast.

pearsonca avatar pearsonca commented on June 24, 2024 1

@zsusswein calling your attention to this discussion as well: https://community.epinowcast.org/t/epinowcast-command-line-interface/207/3, as that seems pertinent to what you're doing. If you could provide example use cases / user stories / etc, the potential outcome of that work would likely be more useful to you et al.

(not that it solves the how-do-i-record-environment-state-so-i-can-later-diagnosis-problems, but it might place it in a different, more manageable box).

from epinowcast.

zsusswein avatar zsusswein commented on June 24, 2024 1

do you have a suggested implementation that would meet your use case

I think the simplest workable solution from my perspective is to expose the versions used in CI for {epinowcast}? I don't mind updating frequently and I would think the type of person who cares about this would feel similarly.

I'm not sure what the right approach is from a package/CRAN perspective. It feels idiomatic to set an option or environment flag, but I'm not sure that's necessarily best practice?

do you think your use case has a wider scope than you/your team (and can you give evidence to this effect).

I like to think it would, but I don't know and have no evidence.

from epinowcast.

adrian-lison avatar adrian-lison commented on June 24, 2024 1

Just as a side note related to this, I think we currently need stan 2.33 (the most recent version) as a minimum to be able to compile the models. If people have an older version of cmdstan installed, they will get an error complaining about the model syntax etc. This is likely confusing, so adding functionality to inform the user about the minimum version required would be valuable.

from epinowcast.

adrian-lison avatar adrian-lison commented on June 24, 2024 1

To always be aware of our minimum version, we could also create a tweak of #386 to always run with a specified minimum version (e.g. starting with 2.33 for now). When this one fails but the #386 github action with the most recent stan version does not, we know it's because our code includes some new features and we need to update the minimum version.

from epinowcast.

zsusswein avatar zsusswein commented on June 24, 2024 1

Sure! Happy to give it a go. A few questions:

  1. Do you want to capture the cmdstan version present on the system at package build, the version in DESCRIPTION.SystemRequirements, or vend it yourself (in R/data.R or R/sysdata.rda perhaps)?
    • Present at build time is simplest but means that it won't work if a user installs from source. Probably a dealbreaker.
    • Relying on SystemRequirements feels most idiomatic but it's unclear to me as a non-dev on this project whether that's a reliable source of information
    • Vending yourself is easy and reliable (track #386 maybe?), but perhaps confusing if it conflicts with SystemRequirements
  2. How do you want to provide the information? It might be nice to echo {cmdstanr}'s syntax, where they have an internal function latest_released_version() that tracks the cmdstan version the user could update to and then suggest an update on load. I'm not sure a suggested update is needed here, but perhaps a similar latest_supported_cmdstan_version() function? You could group it with Utilities in the roxygen2 function reference?
  3. What do you want the latest supported version to actually be? My (source-free) understanding of "supported" is something along the lines of "we will fix bugs coming from this version but if you're on an earlier version please upgrade first to see if the bug goes away." If that very rough definition works here, then I'd suggest you track #386 and stick to the version you test under (which is latest?).

from epinowcast.

seabbs avatar seabbs commented on June 24, 2024 1

It does seem like SystemRequirements would make sense though I think we want to make this a soft requirement (i.e know this works but other versions may also work) so based on that vending it ourselves based on the version used with a specific release makes sense.

How do you want to provide the information? It might be nice to echo {cmdstanr}'s syntax, where they have an internal function

I think this might be a bit heavy in the first instance. We could give an .onload message telling users a version or write it as a header in the stan files?

What do you want the latest supported version to actually be?

For now I am thinking of it less as supported and more built with. In general we will always be "supporting" the latest cmdstan release for any bug fixes etc.

and stick to the version you test under (which is latest?).

yes it is and agree.

from epinowcast.

seabbs avatar seabbs commented on June 24, 2024

This is a good idea and would also help us in development I think. The only thing we want to be careful of is to not make this overly restrictive as stan is pretty stable so its likely our model won't actually depend on a specific version and we don't want people to install new versions for no reason.

I'm not sure we want to supply a function to do this. We could encode the stan dependency into the DESCRIPTION but that might be overly proscriptive. We could also use the new #386 to add the version of cmdstan to main we are use the package work with and then use that in a function to serve the latest recent version?

That would be good as it would be automated but it would be bad as it would always be the least version of cmdstan and we wouldn't know how far back our model would support.

from epinowcast.

pearsonca avatar pearsonca commented on June 24, 2024

Is your feature request related to a problem? Please describe.

I use {epinowcast} in a containerized pipeline where we try to maximize reproducibility. When building new versions of the image, we build {epinowcast} from source from this repository pinned to a specific commit. But it can get a bit messy with layer caching and moving versions to get the latest features.

Prying a bit: how are you defining "reproducibility"? In the scientific sense (i.e. prioritizing ability to specify these precise conditions => get these precise results) or in the engineering sense (i.e. on a budget, reliably churning out parts within quality tolerance standards + minimal disruptions to production line with maintenance/upgrades)?

In my opinion, these are antagonistic, so the answer can't be "both". So: what we might actually do to solve the problem depends on which perspective you're arguing to prioritize.

from epinowcast.

zsusswein avatar zsusswein commented on June 24, 2024

Prying a bit: how are you defining "reproducibility"?

This is a good question and not something I've thought much about before. Perhaps $p < 0.05$?1

In my opinion, these are antagonistic, so the answer can't be "both".

Thinking out loud: seeing the engineering and scientific perspectives as orthogonal makes sense to me, but I'd be interested to hear what you see as actively antagonistic. I'm not sure I'm totally following?

In practice, I see my goals here as closer to your engineering perspective. The priorities are to (1) deliver "correct"2 results (2) on time each week (3) within a reasonable budget of time and computational resources.

Aspirationally, the results are both deterministic for a particular model x dataset x random seed combination and also in some sense "true" about the world. I think containerization helps with the former, but not at all with the latter.

Does that help at all?

Footnotes

  1. Kidding! We're all Bayesians here.

  2. "Correct" is a total weasel word here and you'd be right to call me on it.

from epinowcast.

seabbs avatar seabbs commented on June 24, 2024

This is a nice conversation but perhaps best suited for the community forum or a epinowcast discussion item. Getting back to the question at hand. @zsusswein do you have a suggested implementation that would meet your use case and do you think your use case has a wider scope than you/your team (and can you give evidence to this effect).

from epinowcast.

seabbs avatar seabbs commented on June 24, 2024

@adrian-lison do you think we should aim to support a range of versions or should we just try and say which version for certain works?

from epinowcast.

seabbs avatar seabbs commented on June 24, 2024

@zsusswein unfortunatly on one has had capacity to get to this. If this is still useful for your team would you have time to take a crack at it with some support.

from epinowcast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.