ashleykleynhans / runpod-worker-oobabooga Goto Github PK

RunPod Serverless Worker for Oobabooga Text Generation API for LLMs

License: GNU General Public License v3.0

Python 92.36% Shell 7.64%

llm oobabooga runpod-worker text-generation

runpod-worker-oobabooga's Introduction

oobabooga | RunPod Serverless Worker

This is the source code for a RunPod Serverless worker that uses Oobabooga Text Generation API for LLM text generation AI tasks.

Model

The worker uses the TheBloke/Synthia-34B-v1.2-GPTQ model by TheBloke. Feel free to fork the repo and switch it to an alternate model.

Building the Docker image that will be used by the Serverless Worker

There are two options:

Network Volume
Standalone (without Network Volume)

RunPod API Endpoint

You can send requests to your RunPod API Endpoint using the /run or /runsync endpoints.

Requests sent to the /run endpoint will be handled asynchronously, and are non-blocking operations. Your first response status will always be IN_QUEUE. You need to send subsequent requests to the /status endpoint to get further status updates, and eventually the COMPLETED status will be returned if your request is successful.

Requests sent to the /runsync endpoint will be handled synchronously and are blocking operations. If they are processed by a worker within 90 seconds, the result will be returned in the response, but if the processing time exceeds 90 seconds, you will need to handle the response and request status updates from the /status endpoint until you receive the COMPLETED status which indicates that your request was successful.

Available Oobabooga APIs

Endpoint Status Codes

Status	Description
IN_QUEUE	Request is in the queue waiting to be picked up by a worker. You can call the `/status` endpoint to check for status updates.
IN_PROGRESS	Request is currently being processed by a worker. You can call the `/status` endpoint to check for status updates.
FAILED	The request failed, most likely due to encountering an error.
CANCELLED	The request was cancelled. This usually happens when you call the `/cancel` endpoint to cancel the request.
TIMED_OUT	The request timed out. This usually happens when your handler throws some kind of exception that does return a valid response.
COMPLETED	The request completed successfully and the output is available in the `output` field of the response.

Serverless Handler

The serverless handler (rp_handler.py) is a Python script that handles the API requests to your Endpoint using the runpod Python library. It defines a function handler(event) that takes an API request (event), calls the appropriate oobabooba Text Generation API with the input, and returns the output in the JSON response.

Acknowledgements

Additional Resources

Community and Contributing

Pull requests and issues on GitHub are welcome. Bug fixes and new features are encouraged.

You can contact me and get help with deploying your Serverless worker to RunPod on the RunPod Discord Server below, my username is ashleyk.

Appreciate my work?

runpod-worker-oobabooga's People

Contributors

Stargazers

Watchers

runpod-worker-oobabooga's Issues

an pytorch input issue when using the TheBloke/Synthia-70B-v1.2b-GPTQ model

shape '[1, 41, 64, 128]' is invalid for input of size 41984

2023-10-24 13:34:02.637[][error]ERROR | Error while running job 8ff4c77d-0d94-4c63-bb12-8d8e971e4242-e1: shape '[1, 41, 64, 128]' is invalid for input of size 41984
{5 items
"dt":"2023-10-24 11:34:02.637429"
"endpointid":""
"level":"error"
"message":"ERROR | Error while running job 8ff4c77d-0d94-4c63-bb12-8d8e971e4242-e1: shape '[1, 41, 64, 128]' is invalid for input of size 41984"
"workerId":""
}

Running on any non GPU server?

Do I understand correctly that this is supposed to deploy text-generation-webui locally on a machine without GPU (or on normal VPS servers i.e. DigitalOcean) and then it will allow to use text-generation-webui API linked to a serverless runpod account?

how to trust-remote-code?

as title, how can i config trust-remote-code in runpod environment variables.

Is there a way to send a character via the API?

I was wondering if I could use something other than Assistant or Example that I could perhaps send via the API.

Title should be "without network drive"

https://github.com/ashleykleynhans/runpod-worker-oobabooga/blame/79fefcb69d3a92af78a57a63bc7ab992ce34a4fb/docs/building/without-network-volume.md#L1

Does this take args for the ooba?

Some models need specific arguments like --loader ExLlama etc. Can we pass these through env variables? Also might be a good idea to be able to specify model from the env variables itself.

Service not ready yet. Retrying...

Hello,

First of all, thank you a lot for this contribution. I have followed all your steps with the network volume and everything is going well except when the actual oobabooga service is about to start it gives me this error:

INFO | Service not ready yet. Retrying...

Do you have any idea why this is happening? Thanks a lot!

API returns empty results

So I first did a /list request which returned me my models, I chose my (custom model) that I modified in the standalone dockerfile before building. Then I copied the model name from the /list request and did /load and passed it into the args, which returned me all the settings just like the doc api examples. But when I try the /chat or /generate endpoints my returns are always empty:

/generate

{'delayTime': 6158, 'executionTime': 177, 'id': 'sync-3da22c16-2d1e-4df7-8b3e-be1e2515fb22', 'output': {'results': [{'text': ''}]}, 'status': 'COMPLETED'}

/chat

{'delayTime': 7277, 'executionTime': 185, 'id': 'sync-82cbf2ec-08f7-4844-a130-0dd514e29b80', 'output': {'results': [{'history': {'internal': [], 'visible': []}}]}, 'status': 'COMPLETED'}

Have tried to look at the logs to find out whats the problem, but system and container are kinda shallow. What should I do?