Giter Site home page Giter Site logo

microsoft / finnts Goto Github PK

View Code? Open in Web Editor NEW
176.0 11.0 31.0 3.49 MB

Microsoft Finance Time Series Forecasting Framework (FinnTS) is a forecasting package that utilizes cutting-edge time series forecasting and parallelization on the cloud to produce accurate forecasts for financial data.

Home Page: https://microsoft.github.io/finnts

License: Other

R 100.00%
time-series forecasting machine-learning data-science finnts finance business microsoft r r-package

finnts's Introduction

Microsoft Finance Time Series Forecasting Framework

CRAN_Status_Badge

The Microsoft Finance Time Series Forecasting Framework, aka finnts or Finn, is an automated forecasting framework for producing financial forecasts. While it was built for corporate finance activities, it can easily expand to any time series forecasting problem!

  • Automated feature engineering, feature selection, back testing, and model selection.
  • Access to 25+ models. Both univariate and multivariate models.
  • Azure integration to run thousands of time series in parallel within the cloud.
  • Supports daily, weekly, monthly, quarterly, and yearly forecasts.
  • Handles external regressors, either purely historical or historical+future values.

Installation

CRAN version

install.packages("finnts")

Development version

To get a bug fix or to use a feature from the development version, you can install the development version of finnts from GitHub.

# install.packages("devtools")
devtools::install_github("microsoft/finnts")

Usage

library(finnts)

# prepare historical data
hist_data <- timetk::m4_monthly %>%
  dplyr::rename(Date = date) %>%
  dplyr::mutate(id = as.character(id))

# call main finnts modeling function
finn_output <- forecast_time_series(
  input_data = hist_data,
  combo_variables = c("id"),
  target_variable = "value",
  date_type = "month",
  forecast_horizon = 3,
  back_test_scenarios = 6, 
  models_to_run = c("arima", "ets"), 
  run_global_models = FALSE, 
  run_model_parallel = FALSE
)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

finnts's People

Contributors

akannanmsft avatar dslmsft avatar leon-dan avatar lionel- avatar microsoft-github-operations[bot] avatar microsoftopensource avatar mitokic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finnts's Issues

bug: ensemble training data creation for yearly forecasts

need to adjust back testing process when running a yearly forecasts. Currently a year is the minimum smallest amount of training data when doing model refitting, which works for quarterly down to daily forecasts. But for yearly it should be something like 3-5 years. Or have an automated process that turns ensembles off if the user doesn't have more than "x" amount of historical years of data.

more control over num_cores ran in parallel

By default, only use 2/3 of available CPU cores during any parallel processing. This will help limit the RAM memory errors that might arise. Also enable a user to enter their own number of cores to use, "num_cores" parameter in "forecast_time_series".

categorical variable handling

See what other alternatives to dummy columns can be used to reduce data complexity. Should they just be mapped to certain levels for tree based models? How would that effect more simple linear regression models? What about embeddings?

construct_forecast_models bug

the below code results in an error.

library(finnts)

hist_data <- timetk::m4_monthly %>%
  dplyr::filter(date >= "2010-01-01") %>%
  dplyr::rename(Date = date) %>%
  dplyr::mutate(id = as.character(id))

finn_output <- forecast_time_series(
  input_data = hist_data,
  combo_variables = c("id"),
  target_variable = "value",
  date_type = "month",
  forecast_horizon = 12,
  models_to_run = c("arima", "ets"))

Error in construct_forecast_models(full_data_tbl, external_regressors, :
object 'init_azure_batch_parallel_within' not found

don't think the run_model_parallel feature is working properly. Currently we only pass the "init_azure_batch_parallel_within" and "exit_azure_batch_parallel_within" functions as inputs to the contruct_forecast_models function as inputs for "parallel_init_function" and "parallel_exit_func". I don't see a function that just creates a parallel environment to allow run_model_parallel to work properly when parallel_processing is set to "none".

I think a quick fix would be to add to the init_azure_parallel_within and exit_azure_parallel_within functions within azure_batch_parallel.R file to be flexible to having run_model_parallel work in azure and on a local machine when parallel_processing is set to "none". That new function could live in the general_parallel.R file. The new function names could be something like init_parallel_within and exit_parallel_within, and only contain an argument related to parallel_processing choice given by the user.

What do you think @AKannanMSFT?

historical data issue on future forecast output

future forecast output can contain both historical values and future forecast values.

If input data contains dates after "hist_end_date" input, remove them when producing the future forecast output. That way the user only sees historical values up until his_end_date, and then sees a future forecast for future periods after input value. Right now we are seeing both forecast and hist values.

azure machine learning service integration

allow for logging within Azure ML Service.

Log run results

  • Finn arguments
  • accuracy by combo
  • best model chosen by combo

Should allow for better MLOps processes and tracking.

improve hyperparameter process

Ensure that the optimal hyperparameters are chosen in a way that ensures the most efficient run time and accuracy.

Incorporate new feedback to ensure no data leakage when selecting hyperparameters. Also remove duplication when selecting hyperparameters and refitting models.

Allow user to define how many iterations of hyperparameters to try in tune_grid

Cross-Validation and Back Testing

rethink how hyperparameters and back test folds are ran. Should we go back to Finn 1.0 approach? Or only have a single CV/back test process?

initial package startup messages

  • Where dev package lives.
  • How to install necessary python resources
  • potential conflicts and if package dependencies are out of date

add embeddings for all data models

capture relationships between categorical data like time series ID and other groupings. Helpful in deep learning models, not sure if helpful in standard multivariate ML models.

Create embeddings from deep learning models. Then use those values for categorical variables instead of using one-hot encoding/dummy variables for categorical data. Woohoo!

Could also create a separate recipe to do this or do some initial testing on the dataset and see if we should switch over to it as default. Maybe make a global option to either use dummy variables or embeddings for categorical data.

image

Excerpt from fast.ai deep learning for coders book.

Already looks like an easy integration into a recipe. https://embed.tidymodels.org/reference/step_embed.html

Lastly, we need to determine that size of our embedding. There is no steadfast rule on how to do this but a good heuristic given by Jermey Howard of Fast.Ai is to take half the number of unique values then add one.

Documentation Vignettes

  • Models used in Finn
  • Feature Engineering
  • Back Testing and Hyperparameter Tuning
  • External Regressors
  • Best Model Selection
  • Hierarchical Forecasting
  • Parallel Processing and Azure Batch
  • Production with Azure Machine Learning Pipelines
  • Quick Start Guide

break up monolith

Break out into as many smaller functions as possible. That way finnts becomes more granular and less monolithic. Also see how we can make each sub function exported so it can be used by users outside of "forecast_time_series" function.

more control over running global and local models

Enable a user to turn off all global models that run over the entire dataset (if input_data has more than 1 time series) and also be able to turn off individual models that run over each induvial time series (if input_data has more than 1 time series)

update description file

  • add all contributors (chief economist team and FD&E intern) to authors section
  • update emails
  • add package description
  • correct version number
  • change any version issues with dependencies

Python Support

should we have separate R and Python components to repo if we end up developing a python version of Finn?

@AKannanMSFT something to think about...

dealing with missing data

When filling in missing data (inputed value or with zero), make another binary column that calls out that specific period where data was missing (for target variable and xregs).

log transform feature

transform back to original data before selecting best model. Also think if that should happen even before model averaging is done.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.