bayareametro / bayarea_urbansim Goto Github PK

View Code? Open in Web Editor NEW

This project forked from udst/bayarea_urbansim

12.0 12.0 10.0 141.25 MB

Bay Area Version of the UrbanSim Model

Home Page: http://bayareametro.github.io/bayarea_urbansim

Python 21.91% Jupyter Notebook 78.09%

land-use-model

bayarea_urbansim's People

Stargazers

Watchers

Forkers

bobbylu pksohn jillhough urbansim wsp-sag yuqiww akselx xyzjayne sywang-20 mkreilly

bayarea_urbansim's Issues

Allow separate square feet adjusters to be applied independently

It is possible that the run_setup.yaml file contains both run_telecommute_strategy and sqft_per_job_adjusters flags set as True. However, the code is currently set up such that base sqft adjusters (sqft_per_job_adjusters) are currently only applied if the flag run_telecommute_strategy in run_setup.yaml is set to False:

bayarea_urbansim/baus/variables.py

Line 160 in 098d42f

if run_setup["run_telecommute_strategy"] and year != base_year:

Otherwise the following elif statement is not evaluated - ideally these could be multiplicative - with a base adjustment, and then the strategy adjustment - though it depends on whether the telecommute adjustments are all inclusive - bundling in the base adjustments on the data side.

bayarea_urbansim/baus/variables.py

Line 163 in 098d42f

elif run_setup["sqft_per_job_adjusters"]:

Keys checked for in `baus.py` need to be in `run_setup.yaml`

baus.py currently checks for booleans for a number of steps. These will fail if the key is not defined in the run_setup.yaml file.

It may be cleaner to have the run_setup.yaml file be more minimal, with only relevant keys defined - and conversely in baus.py do a safe dict.get() check, with a False default.

This essentially means more default assumptions are moved to baus.py which may (or may not) be desirable.

Flagging this because errors are raised - the fix is either behavioral - enforce that any key mentioned in baus.py is defined in run_setup.yaml - or else, technical, using safe key getting in baus.py with appropriate defaults.

bayarea_urbansim/baus.py

Line 374 in d36fedc

if run_setup["run_visualizer"]:

Preprocessing issue when jobs exist but no buildings

This creates an Issue for @smmaurer's findings in this PR: #142, copied below:
Due to changes in the base data, I had to loosen a requirement in the allocate_jobs() helper function that's used in the preproc_jobs step.
The baseyear_taz_controls.csv file lists job counts for each TAZ, which are allocated to specific buildings using this code. If there are jobs but no buildings for a particular TAZ, allocate_jobs() raises an error and crashes.
This is now happening for a couple of TAZ's. I modified the code to log the problem and move on, which lets the remainder of the preprocessing complete. The mismatches will need to be resolved on the data side (see separate discussion in Slack).

The temporary fix prevents preprocessing from crashing, but we'll need to understand these cases where jobs exist but no buildings for them, so that the jobs can be allocated if necessary.

BOC: change allowed types to use caps

Parcel IDs from parcels_subzone table incorrect

At some point after the 2018_10_17_parcel_to_taz1454sub.csv input file is read in, and before it reaches the calculate_vmt_fees model step, some entries in the PARCEL_ID index get converted from standard Parcel IDs to IDs in the format: XXXXX.99999.

In the aforementioned step, I inserted a fix here in order to use the table, but we should find out where the issue arises and check what else it may be corrupting.

TM1 outputs should have numeric county identifier

Currently, in the county_marginals table, the county variable is the string representation of counties, and housed on the index.

The population synthesizer expects the numerical integer county designation named COUNTY (1 through 9, starting with San Francisco, moving around the bay counter-clockwise) - mapping is here. The old format had a county_name variable that was a string representation.

Numerical value is already present on taz_geography so it should be trivial to add.

Large household discrepancy flagged by utils.scale_by_target

A temporary fix was put in place for an error that needs to be resolved. The problem is discussed in detail by @smmaurer in the pull request Temporary fix for travel_model_output discrepancies

simulation_validation error

UrbanSim run on 'sam-testing' branch on py 3.6.
Aborted in year 2030 at "assert -1 not in households.building_id.value_counts()"

Cleaning up BAUS - Feb 2020

This issue contains running documentation of work to clean up Bay Area UrbanSim and merge outstanding pull requests. cc @mkreilly

General goals:

confirm that current codebase runs
evaluate and resolve barriers to merging PR #103
evaluate and resolve barriers to merging PR #117
make sure everything is clean and tidy in Python 3

Confirm that current codebase runs

The first thing I've done is check that the current codebase runs.

Codebase: BayAreaMetro/bayarea_urbansim master, at commit 423bb5b
Data files: "Current Large General Input Data", last updated 2019-07-11
Operating system: MacOS 10.15 Catalina

python baus.py -c -y 2015 -s 4 --random-seed >> runs/log10.txt 2>&1

This environment is working:

python 2.7
numpy 1.16.x (latest that supports py27)
pandas 0.24.x (latest that supports py27)
statsmodels 0.10.x (had to downgrade manually; seems like a py27 issue)
orca latest (1.5.x)
urbansim latest (3.1.x)
urbansim_defaults latest (0.2.x)
pandana 0.3.x

That Pandana version is key. Bay Area UrbanSim does NOT run for me with the latest version of Pandana (0.4.4). Here's the error. Looks like probably a difference in how missing values are handled.

And I get the same error with the PR #117 branch running in Python 3, so it seems like not something that's been fixed elsewhere yet.

diagnose and resolve the missing values error with Pandana v0.4.4

Possible issue with missing data when using subsidized_residential_feasibility

When adding the "jobs housing fee" policy to BAUS, an error occurred that led to the following caution on using the "subsidized_residential_feasibility" model. It seems like the following items could lead to missing values, and would be worth looking into:

If two policies are activated and using subsidized_residential_feasibility to create subsidized units, summary.parcel_output will join parcels_geography to the feasibility table twice. For duplicate columns, the newer column will be called '_y'.
In the case of a parcels_geography attribute like 'tra_id', because summary.parcel_output is a dynamic table that grows with each iteration, the newer column would be the correct one, though it is not the one that is maintained.
We avoid this problem because some attributes like 'tra_id' (with all of its values) are added to parcels at the onset, and remain the dominant column through this process. But this is something to note for any columns we hope to get from the parcels_geography join.

Bug in pd.DataFrame instantiation with curly braces

Bug in pd.DataFrame instantiation with curly braces. Could be a dict of series, or just a list.

https://github.com/BayAreaMetro/bayarea_urbansim/blob/d36fedca13617f01ee6ab6d277fa9797b25695ca/baus/visualizer/push_model_files.py#L38C24-L38C24

Notes about dependency versions and compatibility

To have this written down:

Python

v2.7 was used in the last cycle
latest is v3.8, and v3.6+ is considered "current"
BAUS works in 3.6 (after a lot of updating to support Python 3); still need to test newer versions

Python itself doesn't change much in new 3.x releases (and we're not using any cutting edge features) -- the main issue is that compiled numerical libraries need to provide new binaries for each Python version. Big ones like NumPy and Pandas are updated quickly, but it can take longer for niche libraries like Pandana or PyTables (which UrbanSim uses for HDF i/o) to provide support.

Operating systems

BAUS (and Python in general) should run equivalently on Mac, Linux, and Windows
Mac and Linux are almost identical from a Python perspective; the only difference is in getting appropriate binary versions of numerical libraries
Windows has additional variations in file i/o, but BAUS is written to support it

One cross-platform issue we run into is that Pandas sometimes chooses different precision levels for data (like int32 vs int64) depending on the environment it's running in. This can trip up Orca when data tables are being updated -- a good fix is to add the cast=True argument to the update_col_from_series() call. (If the mismatch is more than just precision level, like an int vs. float issue, you should dig deeper to make sure the code is doing the right thing.)

NumPy

v1.10 was used in the prior cycle
latest is v1.17, last version with support for Python 2.7 is v1.16
compatibility issues are not common

A couple of bugs related to changes in NumPy/Pandas default behaviors did come up last year, see PR #99 and issue #96.

Pandas

something around v0.20 was used in the prior cycle
latest is v1.0, last version with support for Python 2.7 is v0.24
BAUS/UrbanSim uses some syntax that was removed in Pandas v1.0 -- the latest version that works right now is v0.25

Statsmodels

latest is v0.11, last version with support for Python 2.7 is v0.10

Statsmodels is installed as a sub-dependency of UrbanSim, for OLS estimation and simulation. On my machine, pip install statsmodels in Python 2.7 tries to install v0.11, which causes errors, so you need to install v0.10 manually. I'll try to get this fixed in an update to UrbanSim.

Scikit-learn

latest is v0.22, last version with support for Python 2.7 is v0.20

This is installed as a sub-dependency of Pandana.

Orca

v1.5 was used in the last cycle [confirm]
latest is v1.5

Orca_test

not used in the last cycle
latest is v0.1

Pandana

v0.2 was used in the last cycle
latest is v0.4

More Pandana notes TK.

UrbanSim

v3.0 was used in the last cycle
latest is v3.1, which is not yet compatible with Pandas 1.0
can only be installed (a) from pip or (b) from the udst conda channel -- not conda-forge
installs statsmodels, matplotlib, and pytables as dependencies (which can sometimes cause problems themselves)

UrbanSim_Defaults

v0.1 was used in the last cycle
latest is v0.2
only installs from pip, not conda

`sqft_per_job` function: Replace `fillna('O')` which is an undefined building type

Should be fillna('OF') which is defined in

bayarea_urbansim/configs/developer/developer_settings.yaml

Line 92 in 49b7da4

OF: 355

https://github.com/BayAreaMetro/bayarea_urbansim/blob/d36fedca13617f01ee6ab6d277fa9797b25695ca/baus/variables.py#L152C60-L152C81

sqft per job factor should not be hard coded

Swap out hard coded value with the one set in

bayarea_urbansim/baus/models.py

Line 804 in d36fedc

add_sizes = add_indexes * 400

and here:

bayarea_urbansim/baus/models.py

Line 856 in 49b7da4

(new_buildings.non_residential_sqft / 445.0).astype('int')

with whatever is in the settings file:

bayarea_urbansim/configs/developer/developer_settings.yaml

Line 81 in 49b7da4

bldg_sqft_per_job: 400

Errors related to data types

I get the following error when I run the standard simulation scenario on a Mac, with Python 2.7 and NumPy 1.16. @theocharides reports that errors like this are why NumPy is pinned at 1.10 in requirements.txt, and the simulation runs fine when I downgrade NumPy.

To do:

check whether changes in the udst/cloud-wrap branch help resolve this: UDST@bfd9769

Error:

Running step 'travel_model_output'
2508 MISSING GEOMIDS!
Describe of development projects
        parcel_id  residential_units  ...  deed_restricted_units  residential_price
count      19.000             19.000  ...                 19.000             19.000
mean  1253636.263             71.000  ...                  0.000              0.000
std    546450.197            121.428  ...                  0.000              0.000
min       189.000              0.000  ...                  0.000              0.000
25%   1096745.000              0.000  ...                  0.000              0.000
50%   1326012.000              4.000  ...                  0.000              0.000
75%   1684626.500             96.000  ...                  0.000              0.000
max   1812103.000            418.000  ...                  0.000              0.000

[8 rows x 12 columns]
Traceback (most recent call last):
  File "baus.py", line 381, in <module>
    run_models(MODE, SCENARIO)
  File "baus.py", line 303, in run_models
    ], iter_vars=[IN_YEAR])
  File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/src/orca/orca/orca.py", line 1945, in run
    step()
  File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/src/orca/orca/orca.py", line 791, in __call__
    return self._func(**kwargs)
  File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/baus/summaries.py", line 830, in travel_model_output
    taz_df = add_age_categories(taz_df, year, rc)
  File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/baus/summaries.py", line 1324, in add_age_categories
    mat = simple_ipf(seed_matrix, col_marginals, row_marginals)
  File "/Users/maurer/Dropbox/Git-mbp-new/bayareametro/bayarea_urbansim/baus/utils.py", line 212, in simple_ipf
    seed_matrix *= ratios
  File "/anaconda3/envs/mtc-test/lib/python2.7/site-packages/pandas/core/ops.py", line 1585, in wrapper
    index=left.index, name=res_name, dtype=None)
  File "/anaconda3/envs/mtc-test/lib/python2.7/site-packages/pandas/core/ops.py", line 1474, in _construct_result
    out = left._constructor(result, index=index, dtype=dtype)
  File "/anaconda3/envs/mtc-test/lib/python2.7/site-packages/pandas/core/series.py", line 249, in __init__
    .format(val=len(data), ind=len(index)))
ValueError: Length of passed values is 1454, index implies 5
None
Traceback (most recent call last):
  File "baus.py", line 391, in <module>
    raise e
ValueError: Length of passed values is 1454, index implies 5
Closing remaining open files:./data/2015_08_03_tmnet.h5...done./data/2015_09_01_bayarea_v3.h5...done./data/2015_06_01_osm_bayarea4326.h5...done

print statement for python3 compliance

Push jobs and households datasets to RedShift & H5

This issue is just to move FMS datasets into the BASIS output folder for BAUS, now that we have a v0 version of them. I think that requires:

Pushing them from GitHub to RedShift- a @drewlevitt task I think?
Adding them to core_datasets.h5- for @ktollas

Thank you!!

duplicated "def zone_forecast_inputs()" function

def zone_forecast_inputs() seems to be duplicated, both in datasources.py and summaries.py. Do we need both?

Development projects & geography summary error

@conorhenley recently implemented a new policy geography system (for trich/conn/hra geographies), which later generated a runtime error.

Running step 'geographic_summary'
Traceback (most recent call last):
  File "baus.py", line 390, in <module>
    run_models(MODE, SCENARIO)
  File "baus.py", line 319, in run_models
    orca.run(models, iter_vars=years_to_run)
  File "c:\users\etheocharides\documents\scenarios\bayarea_urbansim\src\orca\orca\orca.py", line 1945, in run
    step()
  File "c:\users\etheocharides\documents\scenarios\bayarea_urbansim\src\orca\orca\orca.py", line 791, in __call__
    return self._func(**kwargs)
  File "C:\Users\etheocharides\Documents\scenarios\bayarea_urbansim\baus\summaries.py", line 670, in geographic_summary
    parcel_output.groupby(geography).\
  File "C:\anaconda2\envs\numpytest\lib\site-packages\pandas\core\generic.py", line 6663, in groupby
    observed=observed, **kwargs)
  File "C:\anaconda2\envs\numpytest\lib\site-packages\pandas\core\groupby\groupby.py", line 2152, in groupby
    return klass(obj, by, **kwds)
  File "C:\anaconda2\envs\numpytest\lib\site-packages\pandas\core\groupby\groupby.py", line 599, in __init__
    mutated=self.mutated)
  File "C:\anaconda2\envs\numpytest\lib\site-packages\pandas\core\groupby\groupby.py", line 3291, in _get_grouper
    raise KeyError(gpr)
KeyError: 'juris_trich'
None
Traceback (most recent call last):
  File "baus.py", line 400, in <module>
    raise e
KeyError: 'juris_trich

Conor fixed the error in 2162f34. He mentioned that the error had to do with the new "juris_trich" tag not being defined on summary.parcel_output. He also mentioned that the summary.parcel_output is defined in the scheduled development events model. This left us wondering why the error wasn't triggered previously during SDEM.

In a separate line of work, it was realized that most of the development projects (used in the scheduled developments model) were not being read in, because their geom_ids were not valid. It was previously thought that x,y columns were being used in place of geom_ids. The fix for this was made here 1c44d4a. Although most of the projects were not being activated, a handful should have been which should have triggered the error.

Recently, it has also appeared that the columns used to specify particular development projects for particular scenarios do not have an effect on the projects that are built. In addition, development projects prior to 2015 were not being modeled, because scheduled development events is not included in the base year model sequence.

So, this is a two-in-one issue, acknowledging some inconsistencies in the development projects process, but that also may be related to the initial error on summary.parcel_output. We'll need to sort out whether we ever did make the switch to using x,y for the development projects, and if the transition process could be related to the error that showed up, or if it could be related to something else in the development projects.

Zoning bugs

Moraga, Morgan Hill, Milpitas, Benicia, and Newark had the wrong fips. Has been fixed in GIS file but needs to be fixed in both the base map and the zoningmod files (unless we drop fips to use four letter). Note that these errors didn't affect rezoning or anything else in PBA40. They would impact things only if we ended up rezoning based on TPA X jurisdiction, etc. ie. where the juris or sets of jurisdictions is part of the rezoning system

Parcels with missing xy

Please see:
#124
And subsequently:
#146

Random seed - hardcoded vs. command line flag

I noticed that baus.py has been set recently to always use a random seed: baus.py#L25

But there's also code later on that accepts a --random-seed command line flag (baus.py#L71-L72). If the seed is on by default, the flag doesn't have any effect, which could be confusing.

I'm thinking we should do one of the following:

Add another command line option (like --no-random-seed) so that modelers can easily override the default in either direction
Get rid of the command line flag
Set the default back to no random seed, and turn it on with the flag

My vote is for the first option, since it seems most flexible.

Data type error

I ran py 2.7 using the new 'parcels_geography' data and got a data type error at 'proportional jobs model for gov/edu'. The fields in the new parcels_geography table have the same data type as the previous table.