Giter Site home page Giter Site logo

azure-samples / azure-openai-reverse-proxy Goto Github PK

View Code? Open in Web Editor NEW
3.0 11.0 1.0 306 KB

A reverse proxy for distributing requests among model deployments hosted in Azure OpenAI Service.

License: MIT License

Shell 1.95% JavaScript 4.06% Dockerfile 1.90% C# 89.56% PowerShell 2.54%

azure-openai-reverse-proxy's Introduction

Azure OpenAI Reverse Proxy

A reverse proxy for distributing requests across OpenAI model deployments (e.g. GPT-4) hosted in Azure OpenAI Service (AOAI).

Architecture

Important

This is a highly experimental solution, and it's not an official Microsoft product.

Table of contents

Problem Statement

An Azure OpenAI deployment model throttling is designed taking into consideration two configurable rate limits:

  • Tokens-per-minute (TPM): Estimated number of tokens that can processed over a one-minute period
  • Requests-per-minute (RPM): Estimated number of requests over a one-minute period

A deployment model is considered overloaded when at least one of these rate limits is reached, and Azure OpenAI returns an HTTP 429 ("Too Many Requests") response code to the client with a "Retry-After" HTTP header indicating how many seconds the deployment model will be unavailable before starting to accept more requests.

Challenges

What if there is an increasing demand for requests and/or tokens that can't be met with the deployment model's rate limits? Currently the alternatives are:

  1. Increase the model deployment capacity by requesting Provisioned throughput units (PTU).

  2. Build a load balancing component to distribute the requests to the model deployments, hosted on a single or multiple Azure OpenAI resources, optimizing resources utilization and maximizing throughput.

  3. Adopt a failover strategy by forwarding the requests from an overloaded model deployment to another one.

These approaches can be combined to achieve enhanced scalability, performance and availability.

Solution

This repository showcases a proof-of-concept solution for the alternative #2: A reverse proxy built in ASP.NET Core with YARP.

sequenceDiagram
    Client->>Load Balancer: Proxy HTTP request<br/> /<azure-openai-route><br/><azure-openai-credentials>

    box Gray Reverse Proxy
    participant Load Balancer
    participant HTTP Forwarder
    participant Passive Health Check
    participant Destination Health Updater
    participant Transformer
    participant Custom Metrics Publisher
    end

    Load Balancer->>HTTP Forwarder: Selected<br/> deployment destination

    par
      HTTP Forwarder->>Passive Health Check: HTTP response
      Note over Passive Health Check: Evaluate response and mark destination<br/> health state as healthy or unhealthy
      Passive Health Check ->> Destination Health Updater: Update destination state
    and
      HTTP Forwarder->>Transformer: HTTP response
      Note over Transformer: Append x-absolute-uri response header<br /> with the destination address
      Transformer->>Client: HTTP response
    and
      HTTP Forwarder->>Custom Metrics Publisher: HTTP response
      Note over Custom Metrics Publisher: Remaining requests + tokens
    end
Loading

Core features

  • Custom Passive Health Check middleware that intercepts HTTP responses from model deployments selected by the load balancer, and assign health states. For more info, see the Passive Health Check section.

  • Custom OpenTelemetry metrics with built-in support for Prometheus and Azure Monitor exporters to help getting insights about how the proxy is performing the requests distribution. For more info, see the Metrics section.

Passive Health Check

The following diagram gives a state management overview and the logic implemented on the AzureOpenAIPassiveHealthCheckPolicy middleware.

stateDiagram-v2
    state if_state <<choice>>
    [*] --> AzureOpenAIPassiveHealthCheckPolicy
    AzureOpenAIPassiveHealthCheckPolicy --> if_state
    if_state --> Unhealthy: if HTTP status code<br/> >= 400 and <= 599
    if_state --> Unhealthy: if tokens or requests<br/> threshold is reached (optional)
    Unhealthy --> Unknown
    note right of Unknown
      On hold for X seconds<br/> from Retry-After header value
    end note
    Unknown --> Healthy
    if_state --> Healthy : else
Loading

Metrics

The proxy provides custom metrics compliant to OpenTelemetry, giving the flexibility to easily integrate it with many monitoring solutions with minimal effort (e.g. Azure Monitor, Prometheus).

These are the custom metrics the proxy emits:

Metric name Type Description Attributes (dimensions)
reverseproxy_azure_openai_remaining_requests Gauge Remaining HTTP requests.
  • account_name
  • deployment_name
reverseproxy_azure_openai_remaining_tokens Gauge Remaining Azure OpenAI tokens.
  • account_name
  • deployment_name

These metrics may help have a better understanding on how the requests are being distributed among model deployments and run experiments to establish a better configuration that fits your needs (e.g. switching load balancing algorithms, adjusting thresholds, customizing health check policies).

Use Cases

The reverse proxy can be used as:

  1. A gateway to serve as an entrypoint for one or more LLM apps;
  2. A sidecar app to run alongside an LLM app (e.g. in a Kubernetes environment such as Azure Kubernetes Service or Azure Container Apps).

Limitations

  • Resiliency: Currently, when a model deployment request fails (i.e. HTTP response an error status code), the proxy returns the failed request as is to the client.
  • Deployments priority: Currently there's no concept of priority groups of weights to model deployments (e.g. prioritizing PTU-based deployments).

Trying it out

The repository provides the following containerized services out of the box to simplify local development:

Containerized environment

Prerequisites

Proxy configuration options

Create an appsettings.Local.json file on src/proxy directory to start the proxy configuration for your local environment. There are two options to configure the load balancer and passive health check:

  1. Using YARP's built-in ReverseProxy config section to manually set the route and cluster. Check out the YARP-based configuration section for a config sample.

  2. Using a ModelDeploymentsDiscovery config section to dynamically discover model deployments on the Azure OpenAI resource tailored to your filter pattern (e.g. discovering only GPT 3.5 deployments via gpt-35* pattern) and create the route and cluster properties. Check out the [Model deployments discovery configuration] section for a config sample.

YARP-based configuration

{
  "ReverseProxy": {
    "Routes": {
      "route1": {
        "ClusterId": "cluster1",
        "Match": {
          "Path": "{**catch-all}"
        }
      }
    },
    "Clusters": {
      "cluster1": {
        "LoadBalancingPolicy": "RoundRobin",
        "HealthCheck": {
          "Passive": {
            "Enabled": "true",
            "Policy": "AzureOpenAIPassiveHealthCheckPolicy"
          }
        },
        "Metadata": {
          "RemainingRequestsThreshold": "100",
          "RemainingTokensThreshold": "1000"
        },
        "Destinations": {
          "deployment1": {
            "Address": "https://my-account.openai.azure.com/openai/deployments/deployment-1"
          },
          "deployment2": {
            "Address": "https://my-account.openai.azure.com/openai/deployments/deployment-2"
          }
        }
      }
    }
  }
}

Model deployments discovery configuration

{
  "ModelDeploymentsDiscovery": {
    "SubscriptionId": "<subscription id>",
    "ResourceGroupName": "<resource group name",
    "AccountId": "<azure openai account name>",

    "FilterPattern": "gpt-35*",
    "FrequencySeconds": 5,

    "LoadBalancingPolicy": "RoundRobin",
    "PassiveHealthCheck": {
      "Policy": "AzureOpenAIPassiveHealthCheckPolicy",
      "Metadata": {
        "RemainingRequestsThreshold": "100",
        "RemainingTokensThreshold": "1000"
      }
    }
  }
}

OpenTelemetry exporters configuration

The proxy is configured by default to export custom metrics to Prometheus via /metrics HTTP route. If you want to export metrics to Azure Monitor, add the following ApplicationInsights section on the app settings:

{
  ...,
  "ApplicationInsights": {
    "ConnectionString": "<app-insights-connection-string"
  }
}

App settings setup

Create a .env file on the root directory and add the Azure OpenAI API key:

AZURE_OPENAI_API_KEY=<api-key>

The PROXY_ENDPOINT environment variable is set by default on the compose.yml file.

Running the reverse proxy

Spin services up with Docker compose:

docker-compose up

Important

For any code changes, make sure you build the image again before running using the --build flag: docker-compose up --build

Testing the proxy

The repository provides the following ways of sending HTTP requests to Azure OpenAI Chat Completions API through the proxy:

  1. Sequential requests via bash script, available on the scripts folder:

    ./scripts/client.sh
    

    or via powershell

    .\scripts\client.ps1
    
  2. Concurrent requests via k6, a load testing tool:

    docker-compose run k6 run /scripts/client.js
    

Environment teardown

For stopping and removing the containers, networks, volumes and images:

docker-compose down --volumes --rmi all

References

azure-openai-reverse-proxy's People

Contributors

fedeoliv avatar allantargino avatar microsoftopensource avatar dependabot[bot] avatar mramosms avatar microsoft-github-operations[bot] avatar

Stargazers

Rory Woods avatar  avatar  avatar

Watchers

Erik St. Martin avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

n-y-kim

azure-openai-reverse-proxy's Issues

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.