Comments (7)
I've submitted a PR with a guide on self-hosting the OpenFunctions model. It includes instructions and example code for setting up a local server using FastAPI and uvicorn, and for configuring a client to interact with this server using the OpenAI package. I would appreciate a review and any feedback!
from gorilla.
Additionally, I created this Colab notebook to quickly test the example server in action. To run the server, I used a A100 GPU, which provided good response latency. To remotely access the server running on the Colab instance from my local machine, I used ngrok to tunnel the server ports from the Colab instance to public URLs. If you want to try out the server, you can get an ngrok auth-token here.
from gorilla.
Thanks @kcolemangt for raising this. Just updated the README, but basically this prompt is what we used to train the model, and using this should be good? Let me know if you run into any issues.
def get_prompt(user_query, functions=[]):
if len(functions) == 0:
return f"USER: <<question>> {user_query}\nASSISTANT: "
functions_string = json.dumps(functions)
return f"USER: <<question>> {user_query} <<function>> {functions_string}\nASSISTANT: "
I'll keep this open if you run into any issues.
from gorilla.
Thank you, @ShishirPatil. This information helps but could you share the full process to execute the api server locally?
With that information we could update the example to use localhost instead of this line
openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"
from gorilla.
@kcolemangt with OpenAI python package you just have to specify the OpenAI compatible API of your local model. For instance, if you are trying this on your local computer, run Ollama with LiteLLM. Then edit the line you mentioned:
openai.api_base = "http://localhost"
Edit "localhost" to whatever LiteLLM tells you to use
from gorilla.
@kcolemangt with OpenAI python package you just have to specify the OpenAI compatible API of your local model. For instance, if you are trying this on your local computer, run Ollama with LiteLLM. Then edit the line you mentioned:
openai.api_base = "http://localhost"
Edit "localhost" to whatever LiteLLM tells you to use
I think what @kcolemangt is looking for is how to run a server like endpoint, not just one for local development. If he does not, then I am curious how they are running the endpoint at http://luigi.millennium.berkeley.edu:8000/v1
.
Ollama and LiteLLVM are cool for development but they are not really as good for server deployment, I tried to use a compiled model for MLC (https://llm.mlc.ai/docs/deploy/rest.html#install-mlc-chat-package) to run it with the restful API that is compatible with completions, and it does work but it does not look like it has support for the functions to be passed over the API since it is just making the function based on the user input.
For example when I run this code:
import openai
def get_gorilla_response(prompt="Call me an Uber ride type \"Plus\" in Berkeley at zipcode 94704 in 10 minutes",
model="gorilla-openfunctions-v0", functions=[]):
openai.api_key = "EMPTY"
openai.api_base = "http://192.168.30.27:8000/v1"
try:
completion = openai.ChatCompletion.create(
model="gorilla-openfunctions-v1-q4f32_0",
temperature=0.0,
messages=[{"role": "user", "content": prompt}],
functions=functions,
)
return completion.choices[0].message.content
except Exception as e:
print(e, model, prompt)
query = "Call me an Uber ride type \"Plus\" in Berkeley at zipcode 94704 in 10 minutes"
functions = [
{
"name": "Uber Carpool",
"api_name": "uber.ride",
"description": "Find suitable ride for customers given the location, type of ride, and the amount of time the customer is willing to wait as parameters",
"parameters": [{"name": "loc", "description": "location of the starting place of the uber ride"},
{"name": "type", "enum": ["plus", "comfort", "black"],
"description": "types of uber ride user is ordering"},
{"name": "time", "description": "the amount of time in minutes the customer is willing to wait"}]
}
]
resp = get_gorilla_response(query, functions=functions)
print(resp)
print("Done!")
The output is always: call_me_uber_ride_type("Plus", Berkeley, zipcode=94704, duration=10)
So if MLC is not going to work I was wondering if there is another recommended way to spin up an API server on a dedicated GPU server that I can self host like the one at Berkeley?
from gorilla.
Thank you, @ShishirPatil. This information helps but could you share the full process to execute the api server locally?
With that information we could update the example to use localhost instead of this line
openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"
I am also looking for this. Having it running locally as an OpenAI Function Calling drop-in replacement for existing applications. @ShishirPatil
from gorilla.
Related Issues (20)
- Mistral HOT 2
- [OpenFunctions] Unnecessarily and incorrectly invoking functions HOT 1
- Location of gorilla-cli HOT 1
- [bug] Hosted Gorilla: MULTIPLE FUNCTIONS calls HOT 6
- How to get started with OpenFunctions? HOT 4
- [bug] Hosted Gorilla: Running the notebook gives an error as its communicating with OpenAI. Throttling? HOT 1
- when is the training code available? HOT 2
- [RFC] Rearchitect the APIZoo data management
- Kubernetes Pod API json filename
- Berkeley Function Leaderboard for OpenFunctionsv2 Error Handling Error
- [feature] use OpenCodeInterpreter-DS (based on Deepseek Coder) as base for improved coding scores HOT 1
- Berkeley Function Calling Leaderboard Ground Truth Errors HOT 1
- Leaderboard questions HOT 3
- Scene Understading and general response without function calling HOT 2
- Simple OpenAI Function Calling sample does not return correct answer HOT 2
- Running OpenFunctions-v2 locally with llama.cpp HOT 3
- OpenFunctions-v2: How to get prompt for multiturn conversation in inference_local.py?
- [API Zoo] Error Updating API Zoo Index HOT 1
- [Leaderboard] the model is not redirected to its provenance webpage...
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gorilla.