Giter Site home page Giter Site logo

apingali / nixtla Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nixtla/nixtla

0.0 0.0 0.0 69.57 MB

Automated time series processing and forecasting.

License: MIT License

Shell 0.35% Python 62.59% R 0.20% Makefile 3.68% HTML 10.10% HCL 8.49% Jupyter Notebook 13.66% Dockerfile 0.93%

nixtla's Introduction

Open source time series forecasting suite

FeaturesWhereGetting Started

Open In Colab

Open-source time-series pipeline capable of achieving 1% of the performance in the M5 competition.

Our open-source solution has a 25% better accuracy than Amazon Forecast and is 20% more accurate than fbprophet. It also performs 4x faster than Amazon Forecast and is less expensive.

Read this Medium Post for a Step-by-Step guide .

🧰 Features

tspreprocess to preprocess time-series data such as missing values imputation

tsfeatures to generate features to include in the models,

tsforecast to perform forecast at scale

tsbenchmarks to easily calculate accuracy baselines.

NeurosalForecast for state of the art deap learning models

✨ Purpose?

Help data scientists and developers to have access to open source state-of-the-art forecasting pipelines.

How?

We built a complete pipeline that can be deployed in the cloud ☁️ using AWS and consumed via APIs or consumed as a service.

Build your own Infra Terraform

If you want to set up your own infrastructure, follow the instructions in the repository (Azure coming soon). With our Infrastructure as Code written in Terraform, you can deploy our solution in minutes without much effort.

Use our APIs Open In Colab

You can use our fully hosted version as a service through our python SDK (autotimeseries). To consume the APIs on our own infrastructure just request tokens by sending an email to [email protected] or opening a GitHub issue. We currently have free resources available for anyone interested.

Getting Started (SDK)

CI python sdk

Check the following example for a full pipeline:

Install with pip install autotimeseries

Import libraries and config AWS
import os

from autotimeseries.core import AutoTS

autotimeseries = AutoTS(bucket_name=os.environ['BUCKET_NAME'],
                        api_id=os.environ['API_ID'],
                        api_key=os.environ['API_KEY'],
                        aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
                        aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'])
Upload dataset to S3
train_dir = '../data/m5/parquet/train'
# File with target variables
filename_target = autotimeseries.upload_to_s3(f'{train_dir}/target.parquet')
# File with static variables
filename_static = autotimeseries.upload_to_s3(f'{train_dir}/static.parquet')
# File with temporal variables
filename_temporal = autotimeseries.upload_to_s3(f'{train_dir}/temporal.parquet')

Each time series of the uploaded datasets is defined by the column item_id. Meanwhile the time column is defined by timestamp and the target column by demand. We need to pass this arguments to each call.

columns = dict(unique_id_column='item_id',
               ds_column='timestamp',
               y_column='demand')
Send the job to make forecasts and Download
response_forecast = autotimeseries.tsforecast(filename_target=filename_target,
                                              freq='D',
                                              horizon=28,
                                              filename_static=filename_static,
                                              filename_temporal=filename_temporal,
                                              objective='tweedie',
                                              metric='rmse',
                                              n_estimators=170,
                                              **columns)

Download forecasts

autotimeseries.download_from_s3(filename='forecasts_2021-10-12_19-04-32.csv', filename_output='../data/forecasts.csv')

Forecasting Pipeline as a Service

Our forecasting pipeline is modular and built upon simple APIs:

tspreprocess

CI/CD tspreprocess Lambda CI/CD tspreprocess docker image

Time series usually contain missing values. This is the case for sales data where only the events that happened are recorded. In these cases it is convenient to balance the panel, i.e., to include the missing values to correctly determine the value of future sales.

The tspreprocess API allows you to do this quickly and easily. In addition, it allows one-hot encoding of static variables (specific to each time series, such as the product family in case of sales) automatically.

tsfeatures

CI/CD tsfeatures Lambda CI/CD tsfeatures docker image

It is usually good practice to create features of the target variable so that they can be consumed by machine learning models. This API allows users to create features at the time series level (or static features) and also at the temporal level.

The tsfeatures API is based on the tsfeatures library also developed by the Nixtla team (inspired by the R package tsfeatures) and the tsfresh library.

With this API the user can also generate holiday variables. Just enter the country of the special dates or a file with the specific dates and the API will return dummy variables of those dates for each observation in the dataset.

tsforecast

CI/CD tsforecast Lambda CI/CD tsforecast docker image

The tsforecast API is responsible for generating the time series forecasts. It receives as input the target data and can also receive static variables and time variables. At the moment, the API uses the mlforecast library developed by the Nixtla team using LightGBM as a model.

In future iterations, the user will be able to choose different Deep Learning models based on the nixtlats library developed by the Nixtla team.

tsbenchmarks

CI/CD tsbenchmarks Lambda CI/CD tsbenchmarks docker image

The tsbenchmarks API is designed to easily compare the performance of models based on time series competition datasets. In particular, the API offers the possibility to evaluate forecasts of any frequency of the M4 competition and also of the M5 competition.

These APIs, written in Python and can be consumed through an SDK also written in Python. The following diagram summarizes the structure of our pipeline:

Build your own Nixtla in AWS

Why ?

We want to contribute to open source and help data scientists and developers to achieve great forecasting results without the need to implement complex pipelines.

How?

If you want to use our hosted version send us an email or open a github issue and ask for API Keys.

If you want to deploy Nixtla on your own AWS Cloud you will need:

  • API Gateway (to handle API calls).
  • Lambda (or some computational unit).
  • SageMaker (or some bigger computational unit).
  • ECR (to store Docker images).
  • S3 (for inputs and outputs).

You will end with an architecture that looks like the following diagram

Each call to the API executes a particular Lambda function depending on the endpoint. That particular lambda function instantiates a SageMaker job using a predefined type of instance. Finally, SageMaker reads the input data from S3 and writes the processed data to S3, using a predefined Docker image stored in ECR.

To create that infrastructue you can use our own Terraform code (infrastructure as code) or you can create the services from the console.

1. Terraform (infrastructure as Code)

Terraform is an open-source Infrastructure as Code tool that allows you to synthesize all the manual development into an automatic script. We have written all the needed steps to facilitate the deployment of Nixlta in your infrastructure. The Terraform code to create your infrastructure can be found at this link. Just follow the next steps:

  1. Define your AWS credentials. You can define them using:
export AWS_ACCESS_KEY_ID="anaccesskey"
export AWS_SECRET_ACCESS_KEY="asecretkey"

These credentials require permissions to use the S3, ECR, lambda and API Gateway services; in addition, you must be able to create IAM users.

  1. To use Terraform, you must install it. Here is an excellent guide to do so.

  2. Position yourself in the iac/terraform/aws folder.

  3. Run the command terraform init. This command will initialize the working directory with the necessary configuration.

  4. Finally, you just need to use terraform apply. First, the list of services to be built will be displayed. You will have to accept to start the build. Once finished, you will get the API key needed to run the process, as well as the addresses of each of the APIs.

2. Create AWS resources using the console

Create S3 buckets

For each service:

  1. Create an S3 bucket. The code of each lambda function will be uploaded here.

Create ECR repositorires

For each service:

  1. Create a private repository for each service.

Lambda Function

For each service:

  1. Create a lambda function with Python 3.7 runtime.
  2. Modify the runtime setting and enter main.handler in the handler.
  3. Go to the configuration:
    • Edit the general configuration and add a timeout of 9:59.
    • Add an existing role capable of reading/writing from/to S3 and running Sagemaker services.
  4. Add the following environment variables:
    • PROCESSING_REPOSITORY_URI: ECR URI of the docker image corresponding to the service.
    • ROLE: A role capable of reading/writing from/to S3 and also running Sagemaker services.
    • INSTANCE_COUNT
    • INSTANCE_TYPE

API Gateway

  1. Create a public REST API (Regional).
  2. For each endpoint in api/main.py… add a resource.
  3. For each created method add an ANY method:
    • Select lambda function.
    • Select Use Lambda Proxy Integration.
    • Introduce the name of the lambda function linked to that resource.
    • Once the method is created select Method Request and set API key required to true.
  4. Deploy the API.

Usage plan

  1. Create a usage plan based on your needs.
  2. Add your API stage.

API Keys

  1. Generate API keys as needed.

Deployment

GitHub secrets

  1. Set the following secrets in your repo:
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_DEFAULT_REGION

Run the API locally

  1. Create the environment using make init.
  2. Launch the app using make app.

Contributors ✨

Thanks goes to these wonderful people (emoji key):


mergenthaler

🤔 💻

Kiel Rodríguez

💻

Kin

💻

mergenthaler

🤔

fede

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

nixtla's People

Contributors

allcontributors[bot] avatar azulgarza avatar cchallu avatar kielrodriguez avatar mergenthaler avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.