bayareametro / bayarea_urbansim Goto Github PK
View Code? Open in Web Editor NEWThis project forked from udst/bayarea_urbansim
Bay Area Version of the UrbanSim Model
Home Page: http://bayareametro.github.io/bayarea_urbansim
This project forked from udst/bayarea_urbansim
Bay Area Version of the UrbanSim Model
Home Page: http://bayareametro.github.io/bayarea_urbansim
It is possible that the run_setup.yaml
file contains both run_telecommute_strategy
and sqft_per_job_adjusters
flags set as True
. However, the code is currently set up such that base sqft adjusters (sqft_per_job_adjusters
) are currently only applied if the flag run_telecommute_strategy
in run_setup.yaml
is set to False
:
bayarea_urbansim/baus/variables.py
Line 160 in 098d42f
Otherwise the following elif
statement is not evaluated - ideally these could be multiplicative - with a base adjustment, and then the strategy adjustment - though it depends on whether the telecommute adjustments are all inclusive - bundling in the base adjustments on the data side.
bayarea_urbansim/baus/variables.py
Line 163 in 098d42f
baus.py
currently checks for booleans for a number of steps. These will fail if the key is not defined in the run_setup.yaml
file.
It may be cleaner to have the run_setup.yaml
file be more minimal, with only relevant keys defined - and conversely in baus.py
do a safe dict.get() check, with a False default.
This essentially means more default assumptions are moved to baus.py
which may (or may not) be desirable.
Flagging this because errors are raised - the fix is either behavioral - enforce that any key mentioned in baus.py
is defined in run_setup.yaml
- or else, technical, using safe key getting in baus.py
with appropriate defaults.
Line 374 in d36fedc
This creates an Issue for @smmaurer's findings in this PR: #142, copied below:
Due to changes in the base data, I had to loosen a requirement in the allocate_jobs() helper function that's used in the preproc_jobs step.
The baseyear_taz_controls.csv file lists job counts for each TAZ, which are allocated to specific buildings using this code. If there are jobs but no buildings for a particular TAZ, allocate_jobs() raises an error and crashes.
This is now happening for a couple of TAZ's. I modified the code to log the problem and move on, which lets the remainder of the preprocessing complete. The mismatches will need to be resolved on the data side (see separate discussion in Slack).
The temporary fix prevents preprocessing from crashing, but we'll need to understand these cases where jobs exist but no buildings for them, so that the jobs can be allocated if necessary.
At some point after the 2018_10_17_parcel_to_taz1454sub.csv
input file is read in, and before it reaches the calculate_vmt_fees
model step, some entries in the PARCEL_ID
index get converted from standard Parcel IDs to IDs in the format: XXXXX.99999.
In the aforementioned step, I inserted a fix here in order to use the table, but we should find out where the issue arises and check what else it may be corrupting.
Currently, in the county_marginals
table, the county variable is the string representation of counties, and housed on the index.
The population synthesizer expects the numerical integer county designation named COUNTY
(1 through 9, starting with San Francisco, moving around the bay counter-clockwise) - mapping is here. The old format had a county_name
variable that was a string representation.
Numerical value is already present on taz_geography
so it should be trivial to add.
A temporary fix was put in place for an error that needs to be resolved. The problem is discussed in detail by @smmaurer in the pull request Temporary fix for travel_model_output discrepancies
This issue contains running documentation of work to clean up Bay Area UrbanSim and merge outstanding pull requests. cc @mkreilly
General goals:
The first thing I've done is check that the current codebase runs.
Codebase: BayAreaMetro/bayarea_urbansim master, at commit 423bb5b
Data files: "Current Large General Input Data", last updated 2019-07-11
Operating system: MacOS 10.15 Catalina
python baus.py -c -y 2015 -s 4 --random-seed >> runs/log10.txt 2>&1
This environment is working:
That Pandana version is key. Bay Area UrbanSim does NOT run for me with the latest version of Pandana (0.4.4). Here's the error. Looks like probably a difference in how missing values are handled.
And I get the same error with the PR #117 branch running in Python 3, so it seems like not something that's been fixed elsewhere yet.
When adding the "jobs housing fee" policy to BAUS, an error occurred that led to the following caution on using the "subsidized_residential_feasibility" model. It seems like the following items could lead to missing values, and would be worth looking into:
If two policies are activated and using subsidized_residential_feasibility to create subsidized units, summary.parcel_output will join parcels_geography to the feasibility table twice. For duplicate columns, the newer column will be called '_y'.
In the case of a parcels_geography attribute like 'tra_id', because summary.parcel_output is a dynamic table that grows with each iteration, the newer column would be the correct one, though it is not the one that is maintained.
We avoid this problem because some attributes like 'tra_id' (with all of its values) are added to parcels at the onset, and remain the dominant column through this process. But this is something to note for any columns we hope to get from the parcels_geography join.
Bug in pd.DataFrame instantiation with curly braces. Could be a dict of series, or just a list.
To have this written down:
Python itself doesn't change much in new 3.x releases (and we're not using any cutting edge features) -- the main issue is that compiled numerical libraries need to provide new binaries for each Python version. Big ones like NumPy and Pandas are updated quickly, but it can take longer for niche libraries like Pandana or PyTables (which UrbanSim uses for HDF i/o) to provide support.
One cross-platform issue we run into is that Pandas sometimes chooses different precision levels for data (like int32 vs int64) depending on the environment it's running in. This can trip up Orca when data tables are being updated -- a good fix is to add the cast=True
argument to the update_col_from_series()
call. (If the mismatch is more than just precision level, like an int vs. float issue, you should dig deeper to make sure the code is doing the right thing.)
A couple of bugs related to changes in NumPy/Pandas default behaviors did come up last year, see PR #99 and issue #96.
Statsmodels is installed as a sub-dependency of UrbanSim, for OLS estimation and simulation. On my machine, pip install statsmodels
in Python 2.7 tries to install v0.11, which causes errors, so you need to install v0.10 manually. I'll try to get this fixed in an update to UrbanSim.
This is installed as a sub-dependency of Pandana.
More Pandana notes TK.
Should be fillna('OF')
which is defined in
Swap out hard coded value with the one set in
bayarea_urbansim/baus/models.py
Line 804 in d36fedc
and here:
bayarea_urbansim/baus/models.py
Line 856 in 49b7da4
with whatever is in the settings file:
I get the following error when I run the standard simulation scenario on a Mac, with Python 2.7 and NumPy 1.16. @theocharides reports that errors like this are why NumPy is pinned at 1.10 in requirements.txt
, and the simulation runs fine when I downgrade NumPy.
To do:
udst/cloud-wrap
branch help resolve this: UDST@bfd9769Error:
Running step 'travel_model_output'
2508 MISSING GEOMIDS!
Describe of development projects
parcel_id residential_units ... deed_restricted_units residential_price
count 19.000 19.000 ... 19.000 19.000
mean 1253636.263 71.000 ... 0.000 0.000
std 546450.197 121.428 ... 0.000 0.000
min 189.000 0.000 ... 0.000 0.000
25% 1096745.000 0.000 ... 0.000 0.000
50% 1326012.000 4.000 ... 0.000 0.000
75% 1684626.500 96.000 ... 0.000 0.000
max 1812103.000 418.000 ... 0.000 0.000
[8 rows x 12 columns]
Traceback (most recent call last):
File "baus.py", line 381, in <module>
run_models(MODE, SCENARIO)
File "baus.py", line 303, in run_models
], iter_vars=[IN_YEAR])
File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/src/orca/orca/orca.py", line 1945, in run
step()
File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/src/orca/orca/orca.py", line 791, in __call__
return self._func(**kwargs)
File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/baus/summaries.py", line 830, in travel_model_output
taz_df = add_age_categories(taz_df, year, rc)
File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/baus/summaries.py", line 1324, in add_age_categories
mat = simple_ipf(seed_matrix, col_marginals, row_marginals)
File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/baus/utils.py", line 212, in simple_ipf
seed_matrix *= ratios
File "/anaconda3/envs/mtc-test/lib/python2.7/site-packages/pandas/core/ops.py", line 1585, in wrapper
index=left.index, name=res_name, dtype=None)
File "/anaconda3/envs/mtc-test/lib/python2.7/site-packages/pandas/core/ops.py", line 1474, in _construct_result
out = left._constructor(result, index=index, dtype=dtype)
File "/anaconda3/envs/mtc-test/lib/python2.7/site-packages/pandas/core/series.py", line 249, in __init__
.format(val=len(data), ind=len(index)))
ValueError: Length of passed values is 1454, index implies 5
None
Traceback (most recent call last):
File "baus.py", line 391, in <module>
raise e
ValueError: Length of passed values is 1454, index implies 5
Closing remaining open files:./data/2015_08_03_tmnet.h5...done./data/2015_09_01_bayarea_v3.h5...done./data/2015_06_01_osm_bayarea4326.h5...done
This issue is just to move FMS datasets into the BASIS output folder for BAUS, now that we have a v0 version of them. I think that requires:
Thank you!!
def zone_forecast_inputs()
seems to be duplicated, both in datasources.py
and summaries.py
. Do we need both?
@conorhenley recently implemented a new policy geography system (for trich/conn/hra geographies), which later generated a runtime error.
Running step 'geographic_summary'
Traceback (most recent call last):
File "baus.py", line 390, in <module>
run_models(MODE, SCENARIO)
File "baus.py", line 319, in run_models
orca.run(models, iter_vars=years_to_run)
File "c:\users\etheocharides\documents\scenarios\bayarea_urbansim\src\orca\orca\orca.py", line 1945, in run
step()
File "c:\users\etheocharides\documents\scenarios\bayarea_urbansim\src\orca\orca\orca.py", line 791, in __call__
return self._func(**kwargs)
File "C:\Users\etheocharides\Documents\scenarios\bayarea_urbansim\baus\summaries.py", line 670, in geographic_summary
parcel_output.groupby(geography).\
File "C:\anaconda2\envs\numpytest\lib\site-packages\pandas\core\generic.py", line 6663, in groupby
observed=observed, **kwargs)
File "C:\anaconda2\envs\numpytest\lib\site-packages\pandas\core\groupby\groupby.py", line 2152, in groupby
return klass(obj, by, **kwds)
File "C:\anaconda2\envs\numpytest\lib\site-packages\pandas\core\groupby\groupby.py", line 599, in __init__
mutated=self.mutated)
File "C:\anaconda2\envs\numpytest\lib\site-packages\pandas\core\groupby\groupby.py", line 3291, in _get_grouper
raise KeyError(gpr)
KeyError: 'juris_trich'
None
Traceback (most recent call last):
File "baus.py", line 400, in <module>
raise e
KeyError: 'juris_trich
Conor fixed the error in 2162f34. He mentioned that the error had to do with the new "juris_trich" tag not being defined on summary.parcel_output. He also mentioned that the summary.parcel_output is defined in the scheduled development events model. This left us wondering why the error wasn't triggered previously during SDEM.
In a separate line of work, it was realized that most of the development projects (used in the scheduled developments model) were not being read in, because their geom_ids were not valid. It was previously thought that x,y columns were being used in place of geom_ids. The fix for this was made here 1c44d4a. Although most of the projects were not being activated, a handful should have been which should have triggered the error.
Recently, it has also appeared that the columns used to specify particular development projects for particular scenarios do not have an effect on the projects that are built. In addition, development projects prior to 2015 were not being modeled, because scheduled development events is not included in the base year model sequence.
So, this is a two-in-one issue, acknowledging some inconsistencies in the development projects process, but that also may be related to the initial error on summary.parcel_output. We'll need to sort out whether we ever did make the switch to using x,y for the development projects, and if the transition process could be related to the error that showed up, or if it could be related to something else in the development projects.
Moraga, Morgan Hill, Milpitas, Benicia, and Newark had the wrong fips. Has been fixed in GIS file but needs to be fixed in both the base map and the zoningmod files (unless we drop fips to use four letter). Note that these errors didn't affect rezoning or anything else in PBA40. They would impact things only if we ended up rezoning based on TPA X jurisdiction, etc. ie. where the juris or sets of jurisdictions is part of the rezoning system
I noticed that baus.py
has been set recently to always use a random seed: baus.py#L25
But there's also code later on that accepts a --random-seed
command line flag (baus.py#L71-L72). If the seed is on by default, the flag doesn't have any effect, which could be confusing.
I'm thinking we should do one of the following:
Add another command line option (like --no-random-seed
) so that modelers can easily override the default in either direction
Get rid of the command line flag
Set the default back to no random seed, and turn it on with the flag
My vote is for the first option, since it seems most flexible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.