Giter Site home page Giter Site logo

asynchronous-api-dask-terraform's Introduction

Asynchronous API using Dask

  • Do you use Dask to parallelize your computational tasks?
  • Do you wish to expose these tasks through a convenient asynchronous API?
  • Are you contemplating the deployment of this system in AWS, particularly an EC2 or Fargate cluster?

If you've answered "yes" to any of these questions, you've come to the right place! This repository demonstrates a simple usecase to compute fibonacci & provides infrastructure and deployment scripts for AWS. These resources not only facilitate the setup of a Dask cluster but also adapts the cluster seamlessly to varying workloads. Additionally, it allows you to make these services accessible through FastAPI.

It's important to note that this repository primarily functions as a template. While it strives to provide structure and streamline development, deploying it may pose challenges, particularly in terms of learning Terraform and delving into the intricacies of deployment scripts.

Deployment

This system relies on an AWS Fargate cluster, entirely orchestrated using Terraform and eliminates the need for manual configurations. It leverages AWS's Fargate offering and spot instances to enable on-demand scaling.

You'll find Terraform scripts and modules in the deployment directory.

deployment architecture

The system contains 3 main components -

API Layer

This component is quite straightforward, using FastAPI to handle incoming client requests. Requests are processed asynchronously, providing clients immediately with a task_id. These task_id's are systematically generated from the request parameters, avoiding redundant processing of identical requests. The results for each request are stored using their respective task_id as the key in the Redis cache. The API layer follows a fire-and-forget mechanism, i.e. once the request is submitted to Dask, relinquishes control and is no longer responsible for its execution.

Dask System

If you're new to Dask, I recommend reading more about it here & here

This is the most trickiest part of the deployment process. Typically, autoscaling in cloud environments relies on metrics gathered through CloudWatch, triggering alarms to scale resources up or down.

However, Dask prefers to have complete control over autoscaling, and it does so remarkably well. Dask constantly collects internal metrics, such as request processing and pending tasks in the queue, worker availability, resource usage, and historical data on resource utilization for specific tasks. It uses this wealth of information to dynamically adjust the number of worker nodes required to process incoming requests efficiently.

To achieve this:

  • Autoscaling Node: As requests come in, this node scales the number of worker nodes based on demand.check the Terraform code for the necessary permissions, roles, and policies.

  • Scheduler Node: In Dask terminology, the scheduler node is responsible for job distribution and communication with worker nodes. It also provides reliability in cases of job or worker node failures.

    By default, if we let Dask manage its cluster entirely, it will spin up both the scheduler and worker nodes. However, this approach lacks control over the scheduler, which needs to be registered with service discovery on AWS for accessibility within the EC2/Fargate cluster. To simplify this process, we explicitly create a scheduler node, register it with service discovery, and ensure it's accessible within the cluster.

    In the Terraform scripts, this process unfolds as follows: we first create a Scheduler Node, register it with service discovery, wait for the node to become fully operational, and capture its IP. We then pass the instance IP to the Autoscaling Node through an environment variable. You can inspect how `scheduler_address`` to the autoscaling node.

  • Worker Node: The real computational work takes place in the worker nodes within the Dask system. This deployment guarantees the presence of at least one active worker node, with the ability to scale up to a maximum (currently set to 100) number of workers. All workers are automatically registered with the scheduler upon creation, and the creation of worker nodes is managed by the autoscaling node.

    While it's possible to achieve even greater speed and scalability with a more generous budget ($$$), the system currently limits the scaling to 100 dask workers.

Cache layer

This component plays a pivotal role in storing results for all processed requests. The cache system also ensures that results aren't recomputed for identical requests, tying back to the discussion of systematically generating task_ids based on query parameters.

Local development

For local development, follow these steps:

Build the Docker images locally

docker compose build

Launch the stack:

docker compose up

asynchronous-api-dask-terraform's People

Contributors

jkanche avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.