I mentioned this on Slack, but wanted to write it up here as well. S

Some questions that might help: Which version of UrbanSim are

I wonder if the problem is the <a href="https://github.com/synthicity/bayarea_urbansim

Slow HLCM estimation about bayarea_urbansim HOT 30 CLOSED

udst commented on July 3, 2024

Slow HLCM estimation

from bayarea_urbansim.

Comments (30)

jiffyclub commented on July 3, 2024

Some questions that might help:

Which version of UrbanSim are you using?
Can you be more specific than "few months"?
Is the memory usage during estimation extravagant or does it seem usual?

I honestly have no good ideas here because in theory there have been no changes to the way LCM estimation works. I really need to know which step(s) of the process is taking all the time, preferably with a profile dump.

from bayarea_urbansim.

fscottfoti commented on July 3, 2024

I'd also want to know if it's actually sampling alternatives correctly, and the memory use could be a clue to that. It might be worth a test with a really simple model, with only a few alternatives, with only a few "choosers."

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

I wonder if the problem is the estimation sample size of 15000. The estimation_sample_size parameter is used to control how many choosers are used during estimation. So estimation is trying to be done with 15000 choosers x 100 alts per chooser. That might slow things down. But I think that's how it has always been done?

from bayarea_urbansim.

fscottfoti commented on July 3, 2024

Yeah I noticed that and tried it with more like 1500 and it was still slow.

from bayarea_urbansim.

waddell commented on July 3, 2024

Fletcher, did you notice this also? Do you or Sam have any idea what
change might have triggered it or how long ago this started?

Paul

On Fri, Aug 7, 2015 at 12:08 PM, Fletcher Foti [email protected]
wrote:

I'd also want to know if it's actually sampling alternatives correctly,
and the memory use could be a clue to that. It might be worth a test with a
really simple model, with only a few alternatives, with only a few
"choosers."

—
Reply to this email directly or view it on GitHub
#65 (comment)
.

from bayarea_urbansim.

waddell commented on July 3, 2024

Usually something more like 100 would be used. Results don't change very
much with larger sample sizes.

On Fri, Aug 7, 2015 at 12:48 PM, Fletcher Foti [email protected]
wrote:

Yeah I noticed that and tried it with more like 1500 and it was still slow.

—
Reply to this email directly or view it on GitHub
#65 (comment)
.

from bayarea_urbansim.

fscottfoti commented on July 3, 2024

I haven't estimated the HLCM in a while, but I tried it out and and it never finished estimating, so there seems to be a problem. Matt changed a few things in the HLCM a few months ago which is the only thing I can think of.

from bayarea_urbansim.

fscottfoti commented on July 3, 2024

This 15,000 or 1,500 is the sample size of choosers not the sample size of alternatives. Right now the choosers are the actual households in the synthesized population so need to be sampled.

from bayarea_urbansim.

smmaurer commented on July 3, 2024

Thanks for looking into this. The codebase is the current synthicity bayarea_urbansim master. The timeframe would be since the last time the "hlcm_estimate" model was run in the copy of Estimation.ipynb that's in the repo. That actually looks like October 2014 -- the notebook file was changed in March 2015, but that cell wasn't updated.

https://github.com/synthicity/bayarea_urbansim/blob/cf7aebf72b78703faa6d5f3840cfefa09ac3ed78/notebooks/Estimation.ipynb

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

I ran a test in sanfran_urbansim so I could profile the hlcm estimation. It has slowed down too, though it completes in 4-5 minutes with 3000 choosers x 4 segments. There the slowness is in the sampling of alternatives in mnl_interaction_dataset (see this changeset: UDST/urbansim@d17a5ce). np.random.choice is slow and calling it many times adds up. I don't know if that's the exact same thing causing bayarea_urbansim to be so slow, but it's definitely contributing. (It'll be called (estimation_sample_size * number of segments) times at 0.02218 seconds per call on my laptop.)

But if that's the only problem I'd expect it to have finished in a few minutes when @fscottfoti tried setting estimation_sample_size to 1500 and in around 30 minutes with 15000.

from bayarea_urbansim.

fscottfoti commented on July 3, 2024

It's possible I didn't wait long enough - I probably stopped at about 3 minutes. We should probably provide feedback to the user, like printing the log liklihood at each iteration or something. And if the sampling takes a long time, maybe some feedback during that as well.

from bayarea_urbansim.

waddell commented on July 3, 2024

But this same estimation used to take far less time to converge, right? If
it is now taking 3 hours to estimate a reasonable sized model, no-one will
use it to experiment, and that leads to poorly specified models. Hopefully
we can track down the origin of the slowdown and get back to fast
performance for this.

On Fri, Aug 7, 2015 at 5:31 PM, Fletcher Foti [email protected]
wrote:

It's possible I didn't wait long enough - I probably stopped at about 3
minutes. We should probably provide feedback to the user, like printing the
log liklihood at each iteration or something. And if the sampling takes a
long time, maybe some feedback during that as well.

—
Reply to this email directly or view it on GitHub
#65 (comment)
.

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

Well I think LCMs were wrong before UDST/urbansim@d17a5ce, but maybe it was acceptably wrong? I will brainstorm ways to make that faster but nothing comes immediately to mind.

from bayarea_urbansim.

smmaurer commented on July 3, 2024

Interesting. Yeah, printing some feedback would be a helpful stop-gap.

Last night I left the estimation running -- as it's currently specified in bayarea_urbansim master, with 15,000 choosers and 100 alternatives -- and it finished after about 5 hours. Screenshot attached.

from bayarea_urbansim.

waddell commented on July 3, 2024

Do we know if this change was what caused estimation times to change a lot?

I agree sampling without replacement is the preferred way to do this,
though in practical terms it is not clear that it would change parameter
estimates, given the extremely low probability that an alternative out of a
huge universal choice set (say a million alternatives) would be sampled
more than once in an estimation set consisting of, say, 100 alternatives.
And if it did, on extremely infrequent basis, would it affect the estimated
coefficients? I doubt it. Did we ever test whether this was the case?

On Fri, Aug 7, 2015 at 5:50 PM, Matt Davis [email protected] wrote:

Well I think LCMs were wrong before UDST/urbansim@d17a5ce
UDST/urbansim@d17a5ce, but maybe it was
acceptably wrong? I will brainstorm ways to make that faster but nothing
comes immediately to mind.

—
Reply to this email directly or view it on GitHub
#65 (comment)
.

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

You can get quite detailed logging output from UrbanSim. Put this before you run hlcm_estimate:

import logging
from urbansim.utils.logutil import log_to_stream, set_log_level
set_log_level(logging.DEBUG)
log_to_stream()

What would be really helpful for me is if you generated a profile I can look at for one of your 5 hour runs. You can do that in the Notebook by running hlcm_estimate with the prun magic:

%%prun -q -D hlcm_estimate.prof
orca.run(["hlcm_estimate"])

Then email me that hlcm_estimate.prof file.

from bayarea_urbansim.

smmaurer commented on July 3, 2024

Hi @jiffyclub this looks really helpful, thanks. When I run the latter block, it fails with:

KeyError: 'no step named hlcm_estimate'

But if i replace orca.run with sim.run it seems to go fine. Would that do the trick as well, or do i need to get it working through orca? I haven't used that before but just installed it using the conda command from here.

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

@smmaurer sim.run is fine, just means y'all haven't converted to orca yet.

from bayarea_urbansim.

smmaurer commented on July 3, 2024

@jiffyclub Great, here's the profile that was generated: https://github.com/ual/bayarea_urbansim/raw/estimation-test/estimation-test/hlcm_estimate.prof

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

Thanks, I can confirm that all the time is going to drawing alternative samples at https://github.com/UDST/urbansim/blob/acafcd9ce9f67a1d7924d512a32a83facf4904ea/urbansim/urbanchoice/interaction.py#L58.

from bayarea_urbansim.

smmaurer commented on July 3, 2024

Yup, changing that line to sample WITH replacement fixes it. A 30-minute model now estimates in 8 seconds. As Paul said, the replacement shouldn't make much difference in practical terms, so I'll submit this as a pull request. Thanks so much for figuring it out!

from bayarea_urbansim.

waddell commented on July 3, 2024

Sam, would you run the same heck with and without replacement and compare estimated parameters?

Paul

Sent from my iPhone

On Aug 10, 2015, at 7:24 PM, Sam Maurer [email protected] wrote:

Yup, changing that line to sample WITH replacement fixes it. A 30-minute model now estimates in 8 seconds. As Paul said, the replacement shouldn't make much difference in practical terms, so I'll submit this as a pull request. Thanks so much for figuring it out!

—
Reply to this email directly or view it on GitHub.

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

If we want to go back to allowing duplicates I would revert UDST/urbansim@d17a5ce so there is only one call to np.random.choice. And either way should we add a check to make sure the number of alternatives is much larger than the sample size? That seems like the only situation in which this kind of draw would be wise.

Another option I discovered is using the Python stdlib's random.sample, which is some 1000x faster than np.random.choice. The only problem with that is that it creates another entry place for randomness so we/folks would have to seed two random number generators to get repeatable runs (assuming the NumPy and Python random number generators are separate).

from bayarea_urbansim.

fscottfoti commented on July 3, 2024

Seems like it would be worth it if that works.

from bayarea_urbansim.

waddell commented on July 3, 2024

I agree. If we have a fast way to do sampling without replacement, I would
prefer that. But sampling with replacement looks like the lesser of two
evils if it takes so long using np.random. Let's try random.sample and see
how it works.

On Tue, Aug 11, 2015 at 9:34 AM, Fletcher Foti [email protected]
wrote:

Seems like it would be worth it if that works.

—
Reply to this email directly or view it on GitHub
#65 (comment)
.

from bayarea_urbansim.

fscottfoti commented on July 3, 2024

Also it seems like this is something the other UrbanSim would have encountered - if we need to we can dig into how they got random numbers.

from bayarea_urbansim.

smmaurer commented on July 3, 2024

Here's a comparison of estimation results with and without replacement. (Easiest way to compare is by opening the link in two different windows.)

https://github.com/ual/bayarea_urbansim/blob/estimation-test/estimation-test/Estimation-test.ipynb

The first run is WITHOUT replacement and the next two runs are both WITH replacement, to illustrate typical variation between runs of an identical model.

The main difference between WITH and WITHOUT is in the autoPeakTotal term -- not sure why. The other discrepancies are in line with the typical variation between identical runs.

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

From @waddell in slack: "Interesting. It does seem that the differences are generally comparable to the variation between runs. Not entirely sure about the reason autoPeakTotal might be more sensitive to this - but perhaps it has something to do with having less variability to begin with. Let’s go with this for now, and later try Matt’s suggestion of using stdlib's random.sample."

from bayarea_urbansim.

jiffyclub commented on July 3, 2024

I've gone ahead and merged UDST/urbansim#148 to restore speedy behavior at the cost of allowing repeated sampling of alternatives. Note that I've also recently removed the simulation framework from UrbanSim (use Orca instead) so if you pull from the UrbanSim master you'll get those changes as well.

from bayarea_urbansim.

waddell commented on July 3, 2024

I think it is ok to leave the np.random.choice in ActivitySim.

from bayarea_urbansim.

Slow HLCM estimation about bayarea_urbansim HOT 30 CLOSED

Comments (30)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent