Comments (7)
The importance of versioning notebooks/outputs is not limited to the "working" outputs from them. In fact, it may be more pressing for debugging. Whats the best way to share errors in notebooks that need to be debugged? Recently, we committed to master a copy-and-paste of the errors from the Simulation notebook as a text file. BayAreaMetro@4f462d2
This is clearly not a good way to proceed. So what else might we do? Committing those changes on a temporary branch would probably be wiser, but is that the best way to work?
One related issue is that before the Simulation notebook is run, the Estimation notebook might(should?) be run, and this will produce outputs that change yaml files in "configs." This is especially confusing for a new user (especially one thats working with git). It seems that all of the changed YAML files in /configs/ should also be checked in so that Simulation can be debugged with those. However, the user might not know whether or not those configs were relevant to the Simulation bug.
from bayarea_urbansim.
Generally speaking I think Notebooks are terrible for version control for all the obvious reasons. I've begin to just use them for development and polished outputs come from a straight Python script - .e.g. -
https://github.com/synthicity/bayarea_urbansim/blob/master/Simulation.py
As for estimation, I do not think we should be running estimation before simulation every time. Estimation rarely needs to change once we get coefficients we believe in and we just modify simulation inputs and rerun. It does make sense to test estimation to make sure it works on a regular basis, but I would then discard the results. That said, I have often wanted the feature that would check for 1 or 2 decimal place closeness of coefficients and not update the YAML files if it's the same at that degree of precision.
from bayarea_urbansim.
Thanks @fscottfoti. So it sounds like the best thing would be for us to share Python scripts when it comes to the input. What do you make of sharing the outputs? Should we just write the standard error and standard output as Simulation.stderr and Simulation.stdout and commit those? Also, should we commit these on a new branch each time? It seems that we don't need to ever merge back in outputs to master, but we would like to keep track of them and share them.
from bayarea_urbansim.
Makes sense. Honestly just making a gist seems like a good idea for some of these things. Saving and sharing the stdout on the outputs makes a lot of sense. I don't think that these are really version controlled though - I mean there's random noise every time you run so you can't really compare them. I mean you just run them and tag them with a date and some git hashes and just save them. I wonder if you could just make the output directory sync with Box and do it that way?
from bayarea_urbansim.
i can't speak to the random noise. @mkreilly any thoughts on that?
from bayarea_urbansim.
that said, i will remove MetropolitanTransportationCommission@4f462d2 and put it on a branch with the configs
from bayarea_urbansim.
this is how grumpy cat feels about random noise:
from bayarea_urbansim.
Related Issues (20)
- json serialization TypeError: 1442 is not JSON serializable
- requirements file HOT 1
- version control for dependencies HOT 2
- Estimation.ipynb generates NA's in config parameters HOT 1
- How are configs written out by Estimation.ipynb? HOT 2
- misaligned lookup table values? HOT 4
- documentation should specify that the requirements of data regeneration are different from those of simulation HOT 1
- Confusing as a Service: debugging data regeneration processing tasks with different dependencies HOT 2
- data regeneration->estimation.py->nrh_estimate: no object named costar in the file HOT 2
- data regeneration: use better filters/spatial queries to identify identical parcel geometries HOT 7
- stable identifier for parcels "across runs" HOT 5
- Price model: spikes in future years for many HOT 1
- Slow HLCM estimation HOT 30
- Estimating a model with random subset of data? HOT 3
- getting changes from UAL HOT 2
- Incompatibility with UDST/urbansim PR #172 HOT 5
- accessory_units model step fails for simulation years not in 2010, 2015, 2020, etc.
- summaries.py fails with KeyError on jobs table
- issues related to the update to the new microdata
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bayarea_urbansim.