Giter Site home page Giter Site logo

Comments (17)

bletham avatar bletham commented on May 3, 2024 5

For starters, I think we can make a function that takes a model, creates a new model with the same input args/ seasonalities / regressors, and then fits that model but with the initial conditions taken as the parameters of the input model (rather than the current defaults in https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py#L1018). This would be pretty straightforward, and should be much faster than fitting totally from scratch if the data have not changed much.

The only potential challenge is that if we have added a lot more data, the trend changepoints will be in totally different places in the new time series and the old trend changepoint values would be pretty useless. So we'd want to do something where we initialize from old trend changepoint values if the changepoints are close in time, but if they're really far we just fall back to the current default initialization of 0.

from prophet.

bletham avatar bletham commented on May 3, 2024 5

That's most of what's involved but there is one additional detail. If you compare p.params to the default initailization here:

def stan_init():

you'll see that in p.params there are nested arrays. This is so that downstream we have consistent shapes for params whether we did MAP fitting or MCMC. Basically you just need to extract things from the arrays for it to work. This works:

m = Prophet()
m.fit(df)

m2 = Prophet()

def stan_init2():
    res = {}
    for pname in ['k', 'm', 'sigma_obs']:
        res[pname] = m.params[pname][0][0]
    for pname in ['delta', 'beta']:
        res[pname] = m.params[pname][0]
    return res

m2.fit(df, init=stan_init2)

(note that in the documentation for pystan.StanModel.optimizing it says that init should be a callable that returns a dictionary, as done here, but directly supplying the dictionary seems to work too).

I will repeat my caveat from above:

  • If the number of changepoints changes from one model to the next, this will error because delta will be the wrong size.
  • If the locations of the changepoints in time have changed greatly, this may do worse than the default initialization because the initial trend may be very bad.

from prophet.

sammo avatar sammo commented on May 3, 2024 3

This is great information! Looking at the way stan's optimizing function is being called, and according to @bletham 's Nov 2nd, 2018 comment above, looks like we can pass previously-trained-model params by doing this:

p = Prophet(**kwargs)
p.fit(df)
p2 = Prophet(**kwargs)
p2.fit(pd.concat([df, additional_week_of_daily_data_df]), init=p.params)

and init in p2.fit should constructively be passed to stan's optimizing function call. Does that make sense @bletham or am I missing something? I'm assuming this should work even if we have extra regressors and custom seasonalities?

Edit:
When I try the above, I get the below error... There must be something I'm missing.

mismatch in number dimensions declared and found in context; processing stage=initialization; variable name=k; dims declared=(); dims found=(1,1)
WARNING:fbprophet:Optimization terminated abnormally. Falling back to Newton.
mismatch in number dimensions declared and found in context; processing stage=initialization; variable name=k; dims declared=(); dims found=(1,1)

from prophet.

bletham avatar bletham commented on May 3, 2024 1

Fascinating, that's an impressive hack. We anyway should add an interface to Stan's variational inference engine, I haven't yet tried it out in this setting to see if it gives something reasonable.

from prophet.

bletham avatar bletham commented on May 3, 2024 1

@bigredbug47 the code seems fine to me. If the dataframes are sufficiently different there may not be much benefit to warm-starting, maybe try using the same fit dataframe for both and see if it's then shorter the second time?

from prophet.

seanjtaylor avatar seanjtaylor commented on May 3, 2024

@miguelangelnieto I'm going to file this as an enhancement but add it as a wishlist milestone. Do you have a motivating use case? I'm not sure our fitting procedure is actually capable of incremental updates, so actually what we'd end up doing is re-fitting the whole model each time (maybe with the previous starting parameter values).

from prophet.

miguelangelnieto avatar miguelangelnieto commented on May 3, 2024

Let's say that you fit a year of data to the model and you have a model ready to predict.

The idea is to have the possibility of keep adding new data to that model in small batches. For example, read metrics from prometheus and add new data from the last hour. Extracting lot of metric points from some services like prometheus is pretty slow and time consuming, so after the first large fit, would be nice to keep fitting new data but in small batches.

Some sklearn models have the partial_fit possibility, for example Gaussian Naive Bayes.

from prophet.

bletham avatar bletham commented on May 3, 2024

The online learning used by some sklearn models is pretty fundamentally different from how Stan models are fit, I don't think we are going to have a partial_fit like that in the future. Like @seanjtaylor said we could warm-start the fit, which should make things very fast if only a few data are added. @seanjtaylor if it sounds good to you we can put it on the v0.2 list.

from prophet.

jengelman avatar jengelman commented on May 3, 2024

@bletham Dustin Tran figured out a hack to do minibatch ADVI in Stan, which could work. A few people are working on adding streaming variational Bayes to Stan 3, along the lines of Pymc3's inference engine.

from prophet.

jengelman avatar jengelman commented on May 3, 2024

@bletham You also might be able to pass in the distributions of the previous fit as priors. You could then fit on either a small window of data, or just the new datapoints, since the priors are now informative. A few scikit-learn estimators implement this as the "warm_start" parameter. Not sure how fast this would be though, since pystan (as far as I know) has to recompile the model.

Edit: Following Dustin's comment on this thread, you'd also need to figure out how to scale the likelihood if you used the minibatch interface.

from prophet.

eromoe avatar eromoe commented on May 3, 2024

For starters, I think we can make a function that takes a model, creates a new model with the same input args/ seasonalities / regressors, and then fits that model but with the initial conditions taken as the parameters of the input model (rather than the current defaults in https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py#L1018). This would be pretty straightforward, and should be much faster than fitting totally from scratch if the data have not changed much.

@bletham I don't use custom changepoints , only have timeseries with some exogenous regressors. Does it mean I can use the method your mentioned ?

The case is #882 : I want to scroll train and predict data, to check how good prophet perform on my dataset .

from prophet.

bletham avatar bletham commented on May 3, 2024

@eromoe I'll give some thoughts on that specific use case at #882, but for the more general question, when custom changepoints are not specified, then they are just placed uniformly through the history. The challenge that that creates here is that adding more data will mean we should have more and/or differently placed changepoints.

from prophet.

sammo avatar sammo commented on May 3, 2024

Perfect, code works like a charm. Thanks for the details and re-listing the caveats @bletham.
Using timeit magic, here is a comparison of the time needed to fit prophet on two datasets that have 7 datapoints difference (daily data, so m2 has 1 additional week of data):

Model 1 fit:
98.4 ms ± 7.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Model 2 fit (using stan_fit2 by Ben above):
55.9 ms ± 546 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

That's over 40% faster!

On a separate note, predict takes significantly longer than fit. I tried lowering the number of uncertainty_samples down to 20 and even 0 but the time to predict didn't budge. I can take this discussion to a different thread if needed.

Model 1 predict:
1.33 s ± 9.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

from prophet.

eromoe avatar eromoe commented on May 3, 2024

@sammourad you need completely remove the code of uncertain samples calculation . there is a pr about it.

from prophet.

sammo avatar sammo commented on May 3, 2024

@sammourad you need completely remove the code of uncertain samples calculation . there is a pr about it.

I see the PR @eromoe ... Thank you for the help!

from prophet.

bigredbug47 avatar bigredbug47 commented on May 3, 2024

That's most of what's involved but there is one additional detail. If you compare p.params to the default initailization here:

def stan_init():

you'll see that in p.params there are nested arrays. This is so that downstream we have consistent shapes for params whether we did MAP fitting or MCMC. Basically you just need to extract things from the arrays for it to work. This works:

m = Prophet()
m.fit(df)

m2 = Prophet()

def stan_init2():
    res = {}
    for pname in ['k', 'm', 'sigma_obs']:
        res[pname] = m.params[pname][0][0]
    for pname in ['delta', 'beta']:
        res[pname] = m.params[pname][0]
    return res

m2.fit(df, init=stan_init2)

(note that in the documentation for pystan.StanModel.optimizing it says that init should be a callable that returns a dictionary, as done here, but directly supplying the dictionary seems to work too).

I will repeat my caveat from above:

  • If the number of changepoints changes from one model to the next, this will error because delta will be the wrong size.
  • If the locations of the changepoints in time have changed greatly, this may do worse than the default initialization because the initial trend may be very bad.

Hi @bletham
I have implemented based on your example, but the time processing of 2 models is the same, not improve so much as @sammo case, here is my script:

### Period-training 
start_time = time.time()

df = pd.read_csv('revenue_old.csv')
m = Prophet(interval_width=0.95, changepoint_range=0.9, changepoint_prior_scale=200, daily_seasonality=True, yearly_seasonality=True, weekly_seasonality = True, seasonality_mode='multiplicative', n_changepoints=200)
m.fit(df)

end_time = time.time()
spent_time = end_time - start_time
print("Time spent model 1: ", spent_time)

future = m.make_future_dataframe(periods=60)
# future['bitcoin_price'] = df['bitcoin_price']cd ../
# future['litecoin_price'] = df['litecoin_price']
future = future.fillna(0)
forecast = m.predict(future)
fig1 = m.plot(forecast)

start_time = time.time()
df = pd.read_csv('revenue_new.csv')
m2 = Prophet(interval_width=0.95, changepoint_range=0.9, changepoint_prior_scale=200, daily_seasonality=True, yearly_seasonality=True, weekly_seasonality = True, seasonality_mode='multiplicative', n_changepoints=200)
def stan_init2():
    res = {}
    for pname in ['k', 'm', 'sigma_obs']:
        res[pname] = m.params[pname][0][0]
    for pname in ['delta', 'beta']:
        res[pname] = m.params[pname][0]
    return res
m2.fit(df, init=stan_init2)

end_time = time.time()
spent_time = end_time - start_time
print("Time spent model 2: ", spent_time)

future = m2.make_future_dataframe(periods=60)
future = future.fillna(0)
forecast = m2.predict(future)
fig2 = m2.plot(forecast)

Here is the time-processing:

Time spent model 1: 34.34286665916443
Time spent model 2: 33.433412313461304

Please let me know where I was wrong.
Thanks a lot.

from prophet.

sammo avatar sammo commented on May 3, 2024

@bigredbug47 I only had one additional week of daily data when 'partial training' and the additional values where fairly close to the previous history, as @bletham mentioned.

from prophet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.