Giter Site home page Giter Site logo

deretou / openai-apim-lb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from azure-samples/openai-apim-lb

0.0 0.0 0.0 684 KB

Smart load balancing for OpenAI endpoints and Azure API Management

License: MIT License

Shell 1.37% Python 3.80% Bicep 94.83%

openai-apim-lb's Introduction

page_type languages products
sample
bicep
xml
yaml
azure-api-management
azure-openai

๐Ÿš€ Smart Load Balancing for OpenAI Endpoints and Azure API Management

Smart APIM load balancing

Many service providers, including OpenAI, set limits on API calls. Azure OpenAI, for instance, has limits on tokens per minute (TPM) and requests per minute (RPM). Exceeding these limits results in a 429 'TooManyRequests' HTTP Status code and a 'Retry-After' header, indicating a pause before the next request.

This solution incorporates a comprehensive approach, considering UX/workflow design, application resiliency, fault-handling logic, appropriate model selection, API policy configuration, logging, and monitoring. It introduces an Azure API Management Policy that seamlessly integrates a single endpoint to your applications while efficiently managing consumption across multiple OpenAI or other API backends based on their availability and priority.

โœจ Smart vs. Round-Robin Load Balancers

Our solution stands out in its intelligent handling of OpenAI throttling. It is responsive to the HTTP status code 429 (Too Many Requests), a common occurrence due to rate limits in Azure OpenAI. Unlike traditional round-robin methods, our solution dynamically directs traffic to non-throttling OpenAI backends, based on a prioritized order. When a high-priority backend starts throttling, traffic is automatically rerouted to lower-priority backends until the former recovers.

Active mode

Throttling

Key Features:

  • Prioritized Traffic Routing: Implementing 'priority groups' allows for strategic consumption of quotas, prioritizing specific instances over others.

  • No Delay in Backend Switching: Our policy ensures immediate switching to different endpoints without delay, contrary to many existing API Management sample policies that introduce waiting intervals.

Scenarios and Priority Groups:

  • Provisioned Throughput Deployment (PTU): Set as Priority 1 to utilize its capacity first, given its fixed pricing model.
  • Fallback S0 Tier Deployments: Spread across different regions, these are set as Priority 2 and beyond, used when PTU is at capacity.

In cases where multiple backends share the same priority and are all operational, our algorithm randomly selects among them.

Content Structure

  • Manual Setup: Step-by-step instructions for Azure API Management instance setup and policy configuration. View Manual Setup
  • Azure Developer CLI (azd) Setup: Guide for using the Azure Developer CLI for simplified deployment. View AZD Setup
  • Understanding the Policy: Detailed explanation of the API Management policies and their customization. View How Policy Works
  • FAQ: Common questions and answers about the setup and usage of this solution. View FAQ

Conclusion

This smart load balancing solution effectively addresses the challenges posed by API limit constraints in Azure OpenAI. By implementing the strategies outlined in the provided documentation, you can ensure efficient and reliable application performance, leveraging the full potential of your OpenAI and Azure API Management resources.

Productionizing the Solution

Transitioning to production requires careful consideration of security, performance, and cost. For a detailed guide on productizing this solution, including security enhancements, performance optimization, and continuous monitoring.

๐Ÿ”— Related articles

openai-apim-lb's People

Contributors

codebytes avatar microsoftopensource avatar lenisha avatar andredewes avatar dfberry avatar vhvb1989 avatar microsoft-github-operations[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.