timoklimmer / powerproxy-aoai Goto Github PK

View Code? Open in Web Editor NEW

85.0 85.0 24.0 36.64 MB

Monitors and processes traffic to and from Azure OpenAI endpoints.

License: MIT License

Dockerfile 0.99% Python 82.21% PowerShell 16.81%

powerproxy-aoai's People

Stargazers

Watchers

powerproxy-aoai's Issues

adding token limits per client per model

hello! i've been getting this thing up and running on a VM between my team and apps and our Azure OAI service - so far it's working nicely! but, my resource groups and quotas and such mean i have wildly different token limits per model (5K/min on GPT-4 and 30K/min on GPT-3.5 and embedding models). so, i need for users to be able to be configured with limits per model. i made some adjustments to the config.local.yaml structure and the LimitUsage.py file, and things appear to be working as desired, so i thought i'd share and request the feature be implemented so the next time i git-pull for your latest enhancements i won't need to re-edit the code? i don't yet know how to use github PR features so i'm pasting in the relevant bits here. there's definitely a more elegant way to do this, but there's also a lot of nesting and subclassing and i just wanted to get things moving so this is how i did it:

in the config.local.yaml file, under each client, i added a models key, like so:

clients:
  - name: powerautomate
    description: for instances of http calls from PA flows
    key: derpyderpydoo
    max_tokens_per_minute_in_k: 1
    models:
    - name: gpt-4-32k
      max_tokens_per_minute_in_k: 1
    - name: gpt-35-turbo-16k
      max_tokens_per_minute_in_k: 6

leaving the existing max_tokens_per_minute_in_k means your structure is untouched and these changes would be backward-compatible with configs not having the models key.

in LimitUsage.py, inside of on_client_identified(self, routing_slip), i added routing_slip to the call to the tokens-per-client function call:

            self._set_cache_setting(
                f"LimitUsage-{client}-budget",
                self._get_max_tokens_per_minute_in_k_for_client(client, routing_slip),
            )

and then redefined the function itself to take that new parameter and get the model being used in the request, look that up against the client_settings which seamlessly populated the models list using your existing Configuration class:

    def _get_max_tokens_per_minute_in_k_for_client(self, client, routing_slip):
        """Return the number of maximum tokens per minute in thousands for the given client."""
        client_settings = self.app_configuration.get_client_settings(client)
        if client not in self.configured_max_tpms:
            if "max_tokens_per_minute_in_k" not in client_settings:
                raise ImmediateResponseException(
                    Response(
                        content=(
                            f"Configuration for client '{client}' misses a "
                            "'max_tokens_per_minute_in_k' setting. This needs to be set when the "
                            "LimitUsage plugin is enabled."
                        ),
                        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
                    )
                )
            self.configured_max_tpms[client] = int(
                float(client_settings["max_tokens_per_minute_in_k"]) * 1000
            )
        client_models = client_settings.get("models")
        if client_models is not None and routing_slip["incoming_request_body_dict"] is not None:
            model = routing_slip["incoming_request_body_dict"].get("model") or routing_slip["incoming_request_body_dict"].get("model_name")
            if model is not None:
                cm = {m["name"]: m["max_tokens_per_minute_in_k"] for m in client_models}
                client_model_limit = cm.get(model)
                if client_model_limit is not None and client_model_limit > 0:
                    return int(float(client_model_limit * 1000))
        return self.configured_max_tpms[client]

the changes are of course including the routing_slip parameter, and the section beginning with client_models - if the request has the model param (as it should) and if the client has the models key in the settings and that model exists in the client's specific limits, the function returns the model-specific limit, otherwise it returns the class' existing configured_max_tpms for the client.
**edit: made some changes to where client_settings is collected and client_models is referenced

i've also kept some notes on how i set this up on Docker (it was a challenge as i'm relatively new to it), happy to share them as a write-up, and i'm also working on a LogUsageMessagesToJSON plugin since i want our usage histories to be searchable for analysis and building a knowledge graph... would be happy to share that plugin as well if you're interested, once turn all the bugs into features...

Feature Request: Explicitly log model, model version and API version

Hi, not an issue, but a feature request:
Is it possible to log additionally to the Calls the Model, Model Version and API-Version that was used by the call?

Error message when calling deployed powerproxy

I will preface this by saying its likely and issue with our Azure settings and infrastructure, but I am having trouble locating the issue and where potentially to look to find clues.

I pulled the latest and deployed using the powershell script and made a request to the endpoint a few times, I receive the following:

"message": "Could not find any endpoint or deployment with remaining capacity. Try again later."

This is somewhat misleading when I look at the code, as I don't think its a geniune 429 (the OpenAI instance has plenty of capacity for the model and is not under heave use).

powerproxy.py:

# raise 429 if we could not find any suitable endpoint if aoai_response is None: raise ImmediateResponseException( Response( content=json.dumps( {"message": "Could not find any endpoint or deployment with remaining capacity. Try again later."} ), media_type="application/json", status_code=status.HTTP_429_TOO_MANY_REQUESTS, ) )

So does this mean it will throw a 429 on a null or empty response?

However, the reason i think its an issue with our subscription or RG, is that the exact code and config yaml works locally without issue. Its only when its deployed to Azure that we get the message. The endpoints and api keys are identical, and I am calling it in Postman the same way.

I tried looking in the container app logs (ContainerAppConsoleLogs_CL and ContainerAppSystemLogs_CL) and metrics, but I can't see any errors.

Any tips or ideas for troubleshooting this one? Could there be a private endpoint on our OpenAI instance which would prevent this, or some sort of IAM permission needed?

Blocking request towards client prevents "real" streaming

https://github.com/timoklimmer/powerproxy-aoai/blob/ad51fb4d378562135eca180412fad17fff5dacc4/app/powerproxy.py#L229C10-L229C10

Unfortunately, this is a blocking implementation causing the proxy to wait until completion (and then just sending everything out at once in the end)

Feature: load-balance between instances

I've scanned the source code, and didn't find it there, but maybe it's already there. If not, consider it as feature request.

Long story short, there are differences between model versions avaialble in different Azure regions for Azure OpenAI. Therefore there are situations where there are multiple azure opena ai instances, but the deployments available on those endpoints are slightly different.

It would be nice to have that whenever PowerProxy load-balances the requests, it takes into account the deployments available on a given endpoint(s).

Available deployments could come from the config.

Experiencing ResponseNotRead errors

Hey we are currently experiencing the follow error on v0.10.3:


4-06-19T07:24:11.108Z | ERROR: Exception in ASGI application
-- | --
  | 2024-06-19T07:24:11.108Z | Traceback (most recent call last):
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
  | 2024-06-19T07:24:11.108Z | result = await app( # type: ignore[func-returns-value]
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
  | 2024-06-19T07:24:11.108Z | await super().__call__(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
  | 2024-06-19T07:24:11.108Z | await self.middleware_stack(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
  | 2024-06-19T07:24:11.108Z | raise exc
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
  | 2024-06-19T07:24:11.108Z | await self.app(scope, receive, _send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
  | 2024-06-19T07:24:11.108Z | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
  | 2024-06-19T07:24:11.108Z | raise exc
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  | 2024-06-19T07:24:11.108Z | await app(scope, receive, sender)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
  | 2024-06-19T07:24:11.108Z | await self.middleware_stack(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
  | 2024-06-19T07:24:11.108Z | await route.handle(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
  | 2024-06-19T07:24:11.108Z | await self.app(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
  | 2024-06-19T07:24:11.108Z | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
  | 2024-06-19T07:24:11.108Z | raise exc
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  | 2024-06-19T07:24:11.108Z | await app(scope, receive, sender)
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
  | 2024-06-19T07:24:11.108Z | response = await func(request)
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
  | 2024-06-19T07:24:11.108Z | raw_response = await run_endpoint_function(
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
  | 2024-06-19T07:24:11.108Z | return await dependant.call(**values)
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/app/powerproxy.py", line 371, in handle_request
  | 2024-06-19T07:24:11.108Z | f"Text: {aoai_response.text} "
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 576, in text
  | 2024-06-19T07:24:11.108Z | content = self.content
  | 2024-06-19T07:24:11.108Z | ^^^^^^^^^^^^
  | 2024-06-19T07:24:11.108Z | File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 570, in content
  | 2024-06-19T07:24:11.108Z | raise ResponseNotRead()
  | 2024-06-19T07:24:11.108Z | httpx.ResponseNotRead: Attempted to access streaming response content, without having called `read()`.

Using stream causing a code: "ECONNRESET"

When using stream:true we get

[Bug] Streaming proxy response breaks clients by `Content-Length` and `Transfer-Encoding: chunked` headers together

Streaming proxy response has both Content-Length and Transfer-Encoding: chunked headers, which breaks some clients, and is not compliant with RFC.

Steps to reproduce

Make streaming request to proxy with PTU endpoint

Expected behavior

Send response with Transfer-Encoding: chunked without Content-Length

Actual behavior

Response contains both Transfer-Encoding: chunked and Content-Length , causing error in API client:

Error invoking remote method 'send-http-request': Error: Parse Error: Content-Length can't be present with Transfer-Encoding

Justificaiton

Standard requires us to not send Content-Length in case of non-identity Transfer-Encoding, like "chunked", "compress", "deflate", or "gzip":

"Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding." (RFC 2616, Section 4.4, point 3)
New RFC, Section 3.3.2 is not instructing to ignore this violation, so it would be great to remove the Content-Length, otherwise clients might break.

Feature: Configure HTTPX timeouts in config & Error recovery

We've recently been receiving a high number of httpx.ConnectTimeout exceptions on our PAYG endpoints.
The ability to configure the timeouts within the config as well as recover from any exceptions within aoai_targets loop and try the next endpoint would be great too.

Version: v0.10.1

Experiencing PoolTimeouts

Currently we face httpx.PoolTimeouts on version v0.10.3
`
Traceback (most recent call last):

| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
| 2024-06-19T07:01:54.493Z | result = await app( # type: ignore[func-returns-value]
| 2024-06-19T07:01:54.493Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
| 2024-06-19T07:01:54.493Z | await super().call(scope, receive, send)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
| 2024-06-19T07:01:54.493Z | await self.middleware_stack(scope, receive, send)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
| 2024-06-19T07:01:54.493Z | raise exc
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
| 2024-06-19T07:01:54.493Z | await self.app(scope, receive, _send)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
| 2024-06-19T07:01:54.493Z | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| 2024-06-19T07:01:54.493Z | raise exc
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| 2024-06-19T07:01:54.493Z | await app(scope, receive, sender)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
| 2024-06-19T07:01:54.493Z | await self.middleware_stack(scope, receive, send)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
| 2024-06-19T07:01:54.493Z | await route.handle(scope, receive, send)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
| 2024-06-19T07:01:54.493Z | await self.app(scope, receive, send)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
| 2024-06-19T07:01:54.493Z | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| 2024-06-19T07:01:54.493Z | raise exc
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| 2024-06-19T07:01:54.493Z | await app(scope, receive, sender)
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
| 2024-06-19T07:01:54.493Z | response = await func(request)
| 2024-06-19T07:01:54.493Z | ^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
| 2024-06-19T07:01:54.493Z | raw_response = await run_endpoint_function(
| 2024-06-19T07:01:54.493Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.493Z | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
| 2024-06-19T07:01:54.493Z | return await dependant.call(**values)
| 2024-06-19T07:01:54.493Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.493Z | File "/app/powerproxy.py", line 361, in handle_request
| 2024-06-19T07:01:54.494Z | aoai_response = await aoai_target["endpoint_client"].send(
| 2024-06-19T07:01:54.494Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.494Z | File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1661, in send
| 2024-06-19T07:01:54.494Z | response = await self._send_handling_auth(
| 2024-06-19T07:01:54.494Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.494Z | File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
| 2024-06-19T07:01:54.494Z | response = await self._send_handling_redirects(
| 2024-06-19T07:01:54.494Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.494Z | File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
| 2024-06-19T07:01:54.494Z | response = await self._send_single_request(request)
| 2024-06-19T07:01:54.494Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.494Z | File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1763, in _send_single_request
| 2024-06-19T07:01:54.494Z | response = await transport.handle_async_request(request)
| 2024-06-19T07:01:54.494Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:01:54.494Z | File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 372, in handle_async_request
| 2024-06-19T07:01:54.494Z | with map_httpcore_exceptions():
| 2024-06-19T07:01:54.494Z | File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit
| 2024-06-19T07:01:54.494Z | self.gen.throw(typ, value, traceback)
| 2024-06-19T07:01:54.494Z | File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
| 2024-06-19T07:01:54.494Z | raise mapped_exc(message) from exc
| 2024-06-19T07:01:54.494Z | httpx.PoolTimeout
| 2024-06-19T07:02:08.057Z | ERROR: Exception in ASGI application
| 2024-06-19T07:02:08.057Z | Traceback (most recent call last):
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
| 2024-06-19T07:02:08.057Z | yield
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 373, in handle_async_request
| 2024-06-19T07:02:08.057Z | resp = await self._pool.handle_async_request(req)
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 216, in handle_async_request
| 2024-06-19T07:02:08.057Z | raise exc from None
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 192, in handle_async_request
| 2024-06-19T07:02:08.057Z | connection = await pool_request.wait_for_connection(timeout=timeout)
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 35, in wait_for_connection
| 2024-06-19T07:02:08.057Z | await self._connection_acquired.wait(timeout=timeout)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpcore/_synchronization.py", line 148, in wait
| 2024-06-19T07:02:08.057Z | with map_exceptions(anyio_exc_map):
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit
| 2024-06-19T07:02:08.057Z | self.gen.throw(typ, value, traceback)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
| 2024-06-19T07:02:08.057Z | raise to_exc(exc) from exc
| 2024-06-19T07:02:08.057Z | httpcore.PoolTimeout
| 2024-06-19T07:02:08.057Z | The above exception was the direct cause of the following exception:
| 2024-06-19T07:02:08.057Z | Traceback (most recent call last):
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
| 2024-06-19T07:02:08.057Z | result = await app( # type: ignore[func-returns-value]
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in call
| 2024-06-19T07:02:08.057Z | await super().call(scope, receive, send)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in call
| 2024-06-19T07:02:08.057Z | await self.middleware_stack(scope, receive, send)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in call
| 2024-06-19T07:02:08.057Z | raise exc
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in call
| 2024-06-19T07:02:08.057Z | await self.app(scope, receive, _send)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in call
| 2024-06-19T07:02:08.057Z | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| 2024-06-19T07:02:08.057Z | raise exc
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| 2024-06-19T07:02:08.057Z | await app(scope, receive, sender)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in call
| 2024-06-19T07:02:08.057Z | await self.middleware_stack(scope, receive, send)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
| 2024-06-19T07:02:08.057Z | await route.handle(scope, receive, send)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
| 2024-06-19T07:02:08.057Z | await self.app(scope, receive, send)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
| 2024-06-19T07:02:08.057Z | await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| 2024-06-19T07:02:08.057Z | raise exc
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| 2024-06-19T07:02:08.057Z | await app(scope, receive, sender)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
| 2024-06-19T07:02:08.057Z | response = await func(request)
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
| 2024-06-19T07:02:08.057Z | raw_response = await run_endpoint_function(
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
| 2024-06-19T07:02:08.057Z | return await dependant.call(**values)
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/app/powerproxy.py", line 361, in handle_request
| 2024-06-19T07:02:08.057Z | aoai_response = await aoai_target["endpoint_client"].send(
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1661, in send
| 2024-06-19T07:02:08.057Z | response = await self._send_handling_auth(
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1689, in _send_handling_auth
| 2024-06-19T07:02:08.057Z | response = await self._send_handling_redirects(
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1726, in _send_handling_redirects
| 2024-06-19T07:02:08.057Z | response = await self._send_single_request(request)
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1763, in _send_single_request
| 2024-06-19T07:02:08.057Z | response = await transport.handle_async_request(request)
| 2024-06-19T07:02:08.057Z | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 372, in handle_async_request
| 2024-06-19T07:02:08.057Z | with map_httpcore_exceptions():
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/contextlib.py", line 158, in exit
| 2024-06-19T07:02:08.057Z | self.gen.throw(typ, value, traceback)
| 2024-06-19T07:02:08.057Z | File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
| 2024-06-19T07:02:08.057Z | raise mapped_exc(message) from exc
| 2024-06-19T07:02:08.057Z | httpx.PoolTimeout

Updated token estimation for newer models

Currently none of the gpt-4-turbo variants are included, maybe a prefix based approach similar to that used in tiktoken would be best.

AOAI errors not returned on streaming responses

PowerProxy returns "Internal Server Error" when the AOAI endpoint errors.

An easy to replicate example of this is to request a deployment that doesn't exist.

POST: /openai/deployments/**notreal**/chat/completions?api-version=2023-07-01-preview

{
    "messages": [
        {
            "role": "system",
            "content": "You are an AI assistant"
        },
        {
            "role": "user",
            "content": "Hello!"
        }
    ],
    "stream": true
}

Expected:
Content-Type: application/json

{
    "error": {
        "code": "DeploymentNotFound",
        "message": "The API deployment for this resource does not exist. If you created the deployment within the last 5 minutes, please wait a moment and try again."
    }
}

Returned:
Content-Type: text/plain; charset=utf-8

Internal Server Error

Stack trace

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/powerproxy.py", line 309, in handle_request
    f"Text: {aoai_response.text} "
             ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 576, in text
    content = self.content
              ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 570, in content
    raise ResponseNotRead()
httpx.ResponseNotRead: Attempted to access streaming response content, without having called `read()`.

Entra ID / Azure ID authentification wrongly detected

I wrongly receive the warning that Entra ID / Azure ID is not set up.

Steps to reproduce

Setup endpoints with API key authentication
Run powerproxy via VSCode debug config
Make a request using the OpenAI Python SDK, with authentication via API key

Expected behavior

Response from underlying OpenAI model deployment

Actual behavior

Entra ID/Azure AD as authentication method is identified.

openai.BadRequestError: Error code: 400 - {'error': "When Entra ID/Azure AD is used to authenticate, PowerProxy needs a client in itsconfiguration configured with 'uses_entra_id_auth: true', so PowerProxy can map the request to a client."}

Background

It seems that the OpenAI SDK sends the API key additionally via the authentication header (link to code). This triggers the Entra ID/Azure AD check.

With OpenAI SDK version 1.35.1 and powerproxy version 0.11

feature idea: client stream request -> proxy switch to batch -> AOAI -> proxy switch to stream response -> client

i'm going to start figuring this out, unless y'all are already working on such a feature and i can leave it to the experts? the idea is to avoid some of the issues we have in streaming latency by switching to one-shot at the proxy server.

our instance of AOAI breaks up streams into batches and runs them through the content filter (which i have no control over) - each small batch of that can take up to 2 minutes, so a 500-token streaming round-trip can take up to 20 minutes, whereas the batch mode goes through the filter only once and thus usually never going over a minute or two to handle a whole request.

easy answer: "just don't use stream" but unfortunately the various plugins for VS Code we're trying don't allow for this configuration - they're hard-coded to use the streaming mode. so, i'd like to have a configuration (ideally at the client level) to hijack stream requests and send them as one-shots. but i suspect the client side response is expected to be 'application/stream' or whatever, thus the proxy would need to feed everything back in that format else the client would error due to receiving the single json? maybe this last conversion back to stream isn't necessary, i haven't started messing with it yet.

at any rate, i'd love a feature like this, or your thoughts on how to go about contributing to it. thanks!

List of Azure authorizations required for credential used to deploy

Hello,

this is a great job and we are testing PowerProxy for our AI apps deployed.

Do you have the limited list of authorizations (permissions) required for deploying Power Proxy to Azure?
Currently using a (Owner + can grant Contributor) is not enough. I have to use a Owner on the full subscription.

That would be a great addition to the documentation.

Best regards

Routing issue when sending both api-key and authorization in header in v0.10.0

When sending both api-key and authorization header as done per default in the open ai python lib. A entra_id_client must be set or the powerproxy generates an exception.

openai/lib/azure.py

[Q] Ability to use multiple deployments under same resource

Hey,

Just want to start by saying great work! We've tried many of the aoai load balancing implementations, and this is by far the most robust & customizable one we've seend yet.
With the Assistants API being stateful on a per-resource basis, the ability to load balance between multiple deployments in the same resource would be beneficial.
ie, a single AOAI resource with 200PTU gpt-4 as deployment "gpt-4-1", then 280k TPM PAYG gpt-4 as "gpt-4-2".

I'll be looking at adding this as a plugin to just rewrite the URL, but wondering if there's another way of achieving this.

Regards,
Cody

Error running locally

Attempting to run locally, hitting the following error:

AttributeError: 'int' object has no attribute 'startswith'

which is on line 90 in the dicts.py:

if path.startswith("/"):

From debugging, the path variable is evaluated as a 1 (int).

I'm not sure where this path value is coming from and why its bringing in an int.

Trace:

cd /Users/n02/src/powerproxy-aoai/app ; /usr/bin/env /usr/local/bin/python3.11 /Users/n02/.vscode/extensions/ms-python.python-2024.4.1/python_files/lib/python/debugpy/adapter/../../debugpy/launcher 56169 -- powerproxy.py --config-file ../config/config
.local.yaml 
------------------------------------
PowerProxy for Azure OpenAI - v0.0.0
------------------------------------
Proxy runs at port              : 80
Clients identified by API Key   : Team1, Team2
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 732, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "powerproxy.py", line 65, in lifespan
    config.print()
  File "/Users/n02/src/powerproxy-aoai/app/helpers/config.py", line 49, in print
    f"{self['fixed_client'] if 'fixed_client' in self and self['fixed_client'] else '(not set)'}",
                               ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/n02/src/powerproxy-aoai/app/helpers/config.py", line 32, in __getitem__
    return self.values_dict[key]
           ~~~~~~~~~~~~~~~~^^^^^
  File "/Users/n02/src/powerproxy-aoai/app/helpers/dicts.py", line 11, in __getitem__
    return self.get(key)
           ^^^^^^^^^^^^^
  File "/Users/n02/src/powerproxy-aoai/app/helpers/dicts.py", line 33, in get
    keys_from_path = QueryDict._get_keys_from_path(path, separator, escape_sequence)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/n02/src/powerproxy-aoai/app/helpers/dicts.py", line 90, in _get_keys_from_path
    if path.startswith("/"):
       ^^^^^^^^^^^^^^^
AttributeError: 'int' object has no attribute 'startswith'

ERROR:    Application startup failed. Exiting.

Is this an issue with the config yaml and if so do you have an example for running locally?

Thanks

timoklimmer / powerproxy-aoai Goto Github PK

powerproxy-aoai's People

Stargazers

Watchers

Forkers

powerproxy-aoai's Issues

Steps to reproduce

Expected behavior

Actual behavior

Justificaiton

Currently we face httpx.PoolTimeouts on version v0.10.3 ` Traceback (most recent call last):

Steps to reproduce

Expected behavior

Actual behavior

Background

Recommend Projects

Recommend Topics

Recommend Org

Currently we face httpx.PoolTimeouts on version v0.10.3
`
Traceback (most recent call last):