meetkai / functionary Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 85.0 1.51 MB

Chat language model that can use tools and interpret the results

License: MIT License

Python 100.00%

functionary's People

Contributors

Stargazers

Watchers

Forkers

keyman9848 blacksmith-labs francyjglisboa sirichada keyzf thinker007 iamsaurabhgupt vijaydl rizerphe manniru zcabbub hbcbh1999 colkassad gmh5225 derkodex ivanfoong guolsnetgap bet0x twobob dpss77 jiangcongtao tonylampada dineshkumares mrmojor jairamkiran dafu-wu assistaai zhutony inayet i6od sampriti026 jimgoo rgbkrk xiechengmude rodrigobaron waywardspooky themrzmaster kai-hubs perryism bodhihu ochafik johnrjj vital121 paulokuriki hungnm-ai musabgultekin hqy168 brahmai-research cdaprod peterjc123 segmond leixy76 xunfeng1980 localmind-ai objects76 munirabobaker o7s8r6 fellowtraveler sundogs8603 alexisgaro927 ej-hw jefedeoro technohomesteader ishan-marikar kustomzone mivanovitch j034j johnson7788 tejzpr kilingzhang heesuju gunpal5 techthiyanes constion97 rvsh2 yimi81 ericxsun end18

functionary's Issues

Where can I find a description of the "features"?

Thank you for the project.
Where can I find a description of the "features"?
https://github.com/MeetKai/functionary?tab=readme-ov-file#related-projects--their-differences

Support `stream=True`

Hi 👋 ! I'm really excited to work with the model you've trained.

I've been working on a project called chatlab which is a wrapper on the openai interface to support function calling and interactivity within a jupyter notebook. This is the general interface:

import chatlab
import random

def flip_a_coin():
    '''Returns heads or tails'''
    return random.choice(['heads', 'tails'])

chat = chatlab.Chat()
chat.register(flip_a_coin)

await chat("Please flip a coin for me")

𝑓 Ran `flip_a_coin`

Input:

{}

Output:

"tails"

It landed on tails!

I built it with stream=True by default. I wanted to go ahead and use it with functionary but immediately ran into the issue of the endpoint not accepting the streaming payload. Can this be part of the roadmap? I can adapt for allowing not-streaming but I'd love to be able to use this in streaming mode too.

Default Values Not Respected

Hello, the model seems to ignore the default values if function parameters.
Default value is part of JSON schema spec.

Example:

Function:

  {
    "type": "function",
    "function": {
      "name": "add_absence",
      "description": "Add a leave or an absence for a given date.",
      "parameters": {
        "type": "object",
        "properties": {
          "absence_time": {
            "description": "The amount of time to report in percent for this absence for the day. Example 0, 25, 50, 75 or 100. ",
            "default": 100, <--------------------------------------------- default 100
            "enum": [
              0,
              25,
              50,
              75,
              100
            ]
          },
          
          ....

Prompt:
User: I was sick last friday
Assistant:

{ 'content': None,
    'function_call': { 'arguments': '{\n'
                                    '  "email": "[email protected]",\n'
                                    '  "date": "2023-12-08",\n'
                                    '  "absence_time": "75",\n' <-------------------------------------- problem
                                    '  "reason": "Sick Leave"\n'
                                    '}',
                       'name': 'add_absence'},
    'role': 'assistant'}

Maybe some examples including default values should be included during training ?

CUDA Out of memory

I'm trying to run this on a AWS instance g4dn.xlarge instance (trying with different parameters)

any tips for me?

python3 server_vllm.py --model "meetkai/functionary-7b-v1.4" --host 0.0.0.0 --dtype float --gpu-memory-utilization .5 --block-size 8
INFO 11-26 12:46:09 llm_engine.py:72] Initializing an LLM engine with config: model='meetkai/functionary-7b-v1.4', tokenizer='meetkai/functionary-7b-v1.4', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float32, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/home/ubuntu/functionary/server_vllm.py", line 362, in <module>
    engine = AsyncLLMEngine.from_engine_args(engine_args)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 487, in from_engine_args
    engine = cls(engine_args.worker_use_ray,
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 270, in __init__
    self.engine = self._init_engine(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 306, in _init_engine
    return engine_class(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 110, in __init__
    self._init_workers(distributed_init_method)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 142, in _init_workers
    self._run_workers(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 700, in _run_workers
    output = executor(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 70, in init_model
    self.model = get_model(self.model_config)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 93, in get_model
    model = model_class(model_config.hf_config, quant_config)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 268, in __init__
    self.model = MistralModel(config, quant_config)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 226, in __init__
    self.layers = nn.ModuleList([
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 227, in <listcomp>
    MistralDecoderLayer(config, quant_config)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 170, in __init__
    self.mlp = MistralMLP(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 62, in __init__
    self.gate_up_proj = ParallelLinear.column(hidden_size,
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/layers/quantized_linear/__init__.py", line 17, in column
    return ColumnParallelLinear(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/layers.py", line 138, in __init__
    self.create_weights(params_dtype)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/vllm/model_executor/parallel_utils/layers.py", line 150, in create_weights
    torch.empty(self.output_size_per_partition,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB (GPU 0; 14.58 GiB total capacity; 13.91 GiB already allocated; 404.50 MiB free; 13.91 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Missing "config.json" for Huggingface hosted model at /meetkai/functionary-7b-v2-GGUF

I'm trying to use functionary's server.py with the --model /meetkai/functionary-7b-v2-GGUF which throws an error. (Can't use vLLM due to limited GPU memory).

python3 server.py --model /meetkai/functionary-7b-v2-GGUF

throws:

OSError: meetkai/functionary-7b-v2-GGUF does not appear to have a file named config.json. Checkout 'https://huggingface.co/meetkai/functionary-7b-v2-GGUF/main' for available files.

It looks like the config.json needs to be included at HuggingFace for this to work.

Request for provide simple sft examples.

I hope to improve the effectiveness of our internal APIs.

Would like to provide several examples of SFT data to facilitate our data construction ?

What is musabgultekin/functionary-7b-v1

Hey,I think your repo is interesting, but I have some questions.
In your README.md, run python3 server.py --model "musabgultekin/functionary-7b-v1" to start server ,but what is musabgultekin/functionary-7b-v1 an how can I get it.

Running without Openapi / vLLM

Hello, due to GPU constraints I am trying to run the model using a C++ implementation - https://github.com/ggerganov/llama.cpp

This involves converting and quantizing to 4 bits as well as using the runner from the llama.cpp.

What would be a properly created prompt in a "classic running" ?

I tried just passing the functions and messages in the prompt like so but it did not work (inference was not accurate):

messages=[{"role": "user", "content": "What is the weather for Istanbul?"}],

functions=[{
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
},
"required": ["location"],
},
}]

Amazing work you did ! Is it possible to opensource the training method in the eniroment of functionary?

So we could make more open source models to fit with functionary to make the community better and widely used.

Thanks!

Context Size

Hello is context size bigger than 4096 possible or is this a hard limit from the original Llama 2 7b model ?
Thank you

Release training dataset

Thanks for this project, it's a fantastic piece of work! I was wondering if you would consider releasing the dataset along with the training scripts? I am interested in creating a derivative dataset with some training examples (that don't use functions) added for a particular use-case without losing the function calling ability.

Internal Server Error

Hello!

I am setting up a small SQL LLM (functionary-small-v2.2 through a vLLM.py server) with functionary. The LLM has two tools that should be run sequentially, a vector retrieval tool for similar SQL queries and the sql query function itself. The code worked fine for the retrieval tool alone, but when I try to feed the results back into the model I get the following error:

Future exception was never retrieved
future: <Future finished exception=KeyError('add_predefined_fns')>
Traceback (most recent call last):
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 41, in _raise_exception_on_finish
    task.result()
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 429, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 407, in engine_step
    request_outputs = await self.engine.step_async()
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 253, in step_async
    ) = prompt_template.grammar_sample(
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/prompt_template/base_template.py", line 293, in grammar_sample
    if gen_state["stage"] == "function" and gen_state["add_predefined_fns"] is True:
KeyError: 'add_predefined_fns'
INFO 02-09 13:52:58 llm_engine.py:653] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 15.4%, CPU KV cache usage: 0.0%
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x1537ac45d2d0>, request_tracker=<functionary.vllm_monkey_patch.async_llm_engine.RequestTracker object at 0x1537ac420b80>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x1537ac45d2d0>, request_tracker=<functionary.vllm_monkey_patch.async_llm_engine.RequestTracker object at 0x1537ac420b80>)>
Traceback (most recent call last):
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 41, in _raise_exception_on_finish
    task.result()
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 429, in run_engine_loop
    has_requests_in_progress = await self.engine_step()
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 407, in engine_step
    request_outputs = await self.engine.step_async()
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 253, in step_async
    ) = prompt_template.grammar_sample(
  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/prompt_template/base_template.py", line 293, in grammar_sample
    if gen_state["stage"] == "function" and gen_state["add_predefined_fns"] is True:
KeyError: 'add_predefined_fns'

.
.
.

  File "/n/holyscratch01/iaifi_lab/agagliano/LLMs/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 45, in _raise_exception_on_finish
    raise AsyncEngineDeadError(
functionary.vllm_monkey_patch.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

The relevant code is below:

def execute_tool_call(message):
    #start with one for now
    func = message.tool_calls[0].function
    if func.name == "sql_get_similar_examples":
        data_question = json.loads(func.arguments)["data_question"]
        results = get_similar_examples(data_question)
    else:
        results = f"Error: function {message.tool_calls[0].name} does not exist"
    return results

while len(assistant_msg.tool_calls)>0:
    results = execute_tool_call(assistant_msg)
    messages.append({"role": "assistant", "name": assistant_msg.tool_calls[0].function.name, "content": assistant_msg.tool_calls[0].function.arguments})
    messages.append({"role": "function", "name": assistant_msg.tool_calls[0].function.name, "content": str(results)})

    print(messages[-1])

    assistant_msg = client.chat.completions.create(
        model="meetkai/functionary-small-v2.2",
        messages=messages,
        tools=tools,
        tool_choice="auto",
        #temperature=0.0,
    ).choices[0].message

Any thoughts on how to resolve the issue? Thanks!

Is there an article/paper describing the thoughts on this approach

Dear @musabgultekin and other others :)

This repo is super awesome and I am currently benchmarking the functionary model on some medical benchmarks. Is there a paper that is related to this work or another place where I can find the thoughts and considerations made when implementing this framework. Here I am thinking about how functions are parsed to the model. Why there is two messages with role==system etc.

Have you drawn inspiration from OpenAI, and if yes do you have any sources for how OpenAI have implemented functions. I haven't found any official sources but your way of parsing functions as typescript functions seems rather similar to OpenAI.

Looking forward to your response!

Best regards,
Jens

Cuda 11 error

Hi there!
What can i do with it error , i have a Tesla P40 (Pascal arch) and it need in Cuda 11

`root@oc2:/ai/functionary# python3 server_vllm.py --model "/ai/models/functionary-small-v2.2.q8_0.gguf" --host 0.0.0.0 --port 5000
Traceback (most recent call last):
File "/ai/functionary/server_vllm.py", line 33, in
from vllm.engine.arg_utils import AsyncEngineArgs
File "/usr/local/lib/python3.10/dist-packages/vllm/init.py", line 3, in
from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 6, in
from vllm.config import (CacheConfig, ModelConfig, ParallelConfig,
File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 9, in
from vllm.utils import get_cpu_memory, is_hip
File "/usr/local/lib/python3.10/dist-packages/vllm/utils.py", line 11, in
from vllm._C import cuda_utils
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

root@oc2:/ai/functionary# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
`

thanks in advance

`generateMessage` does not parse the function call correctly within messages

https://github.com/MeetKai/functionary/blob/main/functionary/inference.py#L89-L100

The function call will not be correctly parsed if its used in the middle of the text.

Discussions page

Hello, I would like to know where can we have some discussions related to this project that are not necessarily issues ?
Do you have a slack channel or discord ? Or maybe activate Github discussions page ?

I would like to simply share some of my observations on using this model and share knowledge and practices with other people.

Thank you

Using v2 Model Calls Functions with Hallucinated Values

Hello, I have been testing the v 1.4 GGUF version in 4 bit quantiziation for several days and it has worked amazingly well.

I have now migrated to v2 but I encounter very often hallucinated values when calling a function. Here is an example:

With v1.4 (that works well)

user: Show me my project
function_call: []
assistant: It seems that you don't have any projects assigned to you at the moment
user: Let's create a new project
assistant: Sure, I can help with that. Could you please provide me with the code, name, and client for the new project?

With v2:

user: Show me my project
function_call: []
assistant: It seems that you don't have any projects assigned to you at the moment
user: Let's create a new project
function_call: "P001", "Sample Project"
assistant: Your project "Sample Project" (Code: P001) has been successfully created

Why would the v2 be so eager to call functions without prompting the user to provide the missing values ?

V2 Model Less Precise

Hello,

I think that the Mistral v2 model is performing worse than the 1.4. I am not sure if this is related to the base model itself or due to training.

The context of my app is an assistant that uses the APIs of an activity report software in our company.

Here are some examples:

2.0 Model:

In this conversation I try to make the assistant infer the correct date from its context.
I even added the day of the week in the initial context but it fails.

[ {'content': 'Client stdout is HTML capable', 'role': 'system'},
  {'content': 'Current date is: Saturday 2023-12-16', 'role': 'system'},
  {'content': 'Locale is en-GB', 'role': 'system'},
  { 'content': "Currently connected user is '[email protected]' Firstname "
               'Aleksandar Lastname KIRILOV',
    'role': 'system'},
  { 'content': 'This program helps user fill in their monthly activity '
               'reports. Each activity report is associated to a project that '
               'user has worked on. Activities can be reported in increments '
               'of 25% per day. A single day cannot have more than 100% of '
               'reported time - this includes activities and absences.',
    'role': 'system'},
  {'content': '    I was sick last friday', 'role': 'user'},
  { 'content': "Sure, let's add that as an absence for Friday, December 16th, "
               '2023. Please provide the percentage of time you were absent '
               'due to illness.',
    'role': 'assistant',
    'tool_calls': []},
  {'content': 'all day and yes illness', 'role': 'user'},
  { 'content': None,
    'role': 'assistant',
    'tool_calls': [ { 'function': { 'arguments': '{\n'
                                                 '  "email": '
                                                 '"[email protected]",\n'
                                                 '  "date": "2023-12-16",\n'
                                                 '  "absence_time": "100",\n'
                                                 '  "reason": "Maladie"\n'
                                                 '}',
                                    'name': 'add_absence'},
                      'id': 'call_CNeIHAyVRYjg22tJrmrK3PV6',
                      'type': 'function'}]},
  {'content': 'No no I mean last friday', 'role': 'user'},
  { 'content': 'Apologies for the confusion. Could you please provide the date '
               'of your absence?',
    'role': 'assistant',
    'tool_calls': []},
  {'content': '8th this month', 'role': 'user'},
  { 'content': None,
    'role': 'assistant',
    'tool_calls': [ { 'function': { 'arguments': '{\n'
                                                 '  "email": '
                                                 '"[email protected]",\n'
                                                 '  "date": "2023-12-08",\n'
                                                 '  "absence_time": "100",\n'
                                                 '  "reason": "Maladie"\n'
                                                 '}',
                                    'name': 'add_absence'},
                      'id': 'call_QsANdNZRWQAX29h5YwCKrrIt',
                      'type': 'function'}]}]

A second conversation in which things are even worse. First it thinks that today (Saturday) is actually Friday.
Then it incorrectly infers the date of "last Friday".

[ {'content': 'Client stdout is HTML capable', 'role': 'system'},
  {'content': 'Current date is: Saturday 2023-12-16', 'role': 'system'},
  {'content': 'Locale is en-GB', 'role': 'system'},
  { 'content': "Currently connected user is '[email protected]' Firstname "
               'Aleksandar Lastname KIRILOV',
    'role': 'system'},
  { 'content': 'This program helps user fill in their monthly activity '
               'reports. Each activity report is associated to a project that '
               'user has worked on. Activities can be reported in increments '
               'of 25% per day. A single day cannot have more than 100% of '
               'reported time - this includes activities and absences.',
    'role': 'system'},
  {'content': '    I was sick last friday', 'role': 'user'},
  { 'content': "Sure, let's add that as an absence for Friday, December 16th, "
               '2023. Please provide the percentage of time you were absent '
               'due to illness.',
    'role': 'assistant',
    'tool_calls': []},
  {'content': 'No I mean the Friday before', 'role': 'user'},
  { 'content': "Ah, got it! Let's add that as an absence for Friday, December "
               '9th, 2023. Please provide the percentage of time you were '
               'absent due to illness.',
    'role': 'assistant',
    'tool_calls': []}]

1.4 Model:

With the old model things are a lot smoother.
It first asks me to confirm the information and I tell it to infer everything.
It first proposes friday (yesteday), then I correct it to "the friday before" and it correctly infers a function call.

[ {'content': 'Client stdout is HTML capable', 'role': 'system'},
  {'content': 'Current date is: 2023-12-16', 'role': 'system'},
  {'content': 'Locale is en-GB', 'role': 'system'},
  { 'content': "Currently connected user is '[email protected]' Firstname "
               'Aleksandar Lastname KIRILOV',
    'role': 'system'},
  { 'content': 'This program helps user fill in their monthly activity '
               'reports. Each activity report is associated to a project that '
               'user has worked on. Activities can be reported in increments '
               'of 25% per day. A single day cannot have more than 100% of '
               'reported time - this includes activities and absences.',
    'role': 'system'},
  {'content': '    I was sick last friday', 'role': 'user'},
  { 'content': 'To add a sick leave for last Friday, please provide the '
               'following details:\n'
               '- Your email address\n'
               '- The date of the activity (in YYYY-MM-DD format)\n'
               '- Reason for the absence (Maladie)\n'
               '- The amount of time you were absent (in percentages, e.g., 50 '
               'for half a day or 100 for a full day)',
    'role': 'assistant'},
  {'content': 'You have all the needed information', 'role': 'user'},
  { 'content': None,
    'function_call': { 'arguments': '{\n'
                                    '  "email": "[email protected]",\n'
                                    '  "date": "2023-12-15",\n'
                                    '  "reason": "Maladie",\n'
                                    '  "absence_time": "100"\n'
                                    '}',
                       'name': 'add_absence'},
    'role': 'assistant'},
  {'content': 'No I meant the friday before', 'role': 'user'}]

{'role': 'assistant', 'content': None, 'function_call': {'name': 'add_absence', 'arguments': '{\n  "email": "[email protected]",\n  "date": "2023-12-08",\n  "reason": "Maladie",\n  "absence_time": "100"\n}'}}

Inconsistent prompting schema

The prompt example in the documentation is as follows:

system:
namespace weather {

// Get the current weather
type get_current_weather  = (_: {
// The city and state, e.g. San Francisco, CA
location: string,
// The temperature unit to use. Infer this from the users location.
format: "celsius" | "fahrenheit",
}) => any;

} // namespace weather
system:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary
user:
</s>What is the weather in Istanbul?</s>
assistant

However, when de-tokenizing the output of prepare_messages_for_inference I get:

 system:
// Supported function definitions that should be called when necessary.
namespace functions {

// Get the current weather
type get_current_weather = (_: {
// The city and state, e.g. San Francisco, CA
location: string,
// The temperature unit to use. Infer this from the users location.
format: "farehnheit" | "celsius",
}) => any;

} // namespace functions
 system:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary
 user:
</s>What is the weather in Istanbul?
 assistant

Crucially, there are spaces before the role names in the generated prompt, the example from the README doesn't include Supported function definitions that should be called when necessary., and the generated user message does not end with </s>. Can you confirm which of these prompting schemes was used in training?

Code used to generate the example:

tokenizer = LlamaTokenizer.from_pretrained("musabgultekin/functionary-7b-v1")
print(
    tokenizer.decode(
        prepare_messages_for_inference(
            tokenizer=tokenizer,
            messages=[
                ChatMessage(role="user", content="What is the weather in Istanbul?"),
            ],
            functions=[
                Function(
                    name="get_current_weather",
                    description="Get the current weather",
                    parameters={
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city and state, e.g. San Francisco, CA",
                            },
                            "format": {
                                "type": "string",
                                "enum": ["farehnheit", "celsius"],
                                "description": "The temperature unit to use. Infer this from the users location.",
                            },
                        },
                        "required": ["location", "format"],
                    },
                )
            ],
        )[0]
    )
)

Response from oobabooga api not properly formatted

Apologies for bringing up what is more an Ooba issue, but I am getting what seems to be incorrectly formatted outputs while querying the model through the text-generation-webui api with text completion.

Input prompt:

<|from|>system
<|recipient|>all
<|content|>// Supported function definitions that should be called when necessary.
namespace functions {

// Get the price of a particular car model
type get_car_price = (_: {
// The name of the car model.
car_name: string,
}) => any;

// get the weather of a location
type get_weather = (_: {
// where to get weather.
location: string,
}) => any;

} // namespace functions
<|from|>system
<|recipient|>all
<|content|>
<|from|>user
<|recipient|>all
<|content|>who is the president of US
<|from|>assistant
<|recipient|>all
<|content|>Biden is the president of US<|stop|>
<|from|>user
<|recipient|>all
<|content|>is the car Song more expensive than car Tang?
<|from|>assistant
<|recipient|>all
<|content|>I will get the price of 2 cars and compare
<|from|>assistant
<|recipient|>get_car_price
<|content|>{"car_name": "Song"}
<|from|>assistant
<|recipient|>get_car_price
<|content|>{"car_name": "Tang"}<|stop|>
<|from|>get_car_price
<|recipient|>all
<|content|>{"price": {"price": "$25000"}}
<|from|>get_car_price
<|recipient|>all
<|content|>{"price": {"price": "$20000"}}
<|from|>assistant
<|recipient|>

Returned Completion:

 all
 The car Tang is more expensive than the car Song. assistant
 all
 The car Tang is more expensive than

I did wonder if it might be the quants, but I've tried it with this model https://huggingface.co/LoneStriker/functionary-small-v2.2-6.0bpw-h6-exl2 as well as this one https://huggingface.co/LoneStriker/functionary-medium-v2.2-3.5bpw-h6-exl2 as well as the 4bpw gguf from https://huggingface.co/meetkai/functionary-small-v2.2-GGUF.

Any idea what could be causing this issue - it would be great to get this working with the text-generation-webui as it's just nice t have everything in one place rather than having to rely on multiple different backends.

APIConnectionError

i am getting APIConnectionError: Error communicating with OpenAI: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000229AB3CECA0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
Is there any way to fix it??

Is this project not dead now?

Hey Musab,

just wondering since I have seen a bunch of prs and activity in this repo in my email; Are you working on this now? and is it one of your companies official projects they are putting money into, or is it just your co-workers working on it with your for fun? thanks!

Vary looking forward to openly sharing the detailed process of handling the training data for functions.

vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

`_requests_in_progress = await self.engine_step()
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 338, in engine_step
request_outputs = await self.engine.step_async()
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 191, in step_async
output = await self._run_workers_async(
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 219, in _run_workers_async
all_outputs = await asyncio.gather(*coros)
File "/root/miniconda3/envs/agent/lib/python3.10/asyncio/tasks.py", line 650, in _wrap_awaitable
return (yield from awaitable.await())
ray.exceptions.RayTaskError(RuntimeError): ray::RayWorkerVllm.execute_method() (pid=954874, ip=10.19.23.192, actor_id=67ba2e9c25498a976914e72a01000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f25b2710e80>)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 31, in execute_method
return executor(*args, **kwargs)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/worker/worker.py", line 159, in execute_model
output = self.model_runner.execute_model(seq_group_metadata_list,
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 354, in execute_model
output = self.model.sample(
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 295, in sample
next_tokens = self.sampler(self.lm_head.weight, hidden_states,
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 92, in forward
prompt_logprobs, sample_logprobs = _get_logprobs(
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 487, in _get_logprobs
top_logprobs, top_token_ids = torch.topk(logprobs,
RuntimeError: selected index k out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/fastapi/applications.py", line 1106, in call
await super().call(scope, receive, send)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/fastapi/routing.py", line 274, in app
raw_response = await run_endpoint_function(
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/workspace/functionary/server_vllm.py", line 398, in create_chat_completion
async for res in result_generator:
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 445, in generate
raise e
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 439, in generate
async for request_output in stream:
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 70, in anext
raise result
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 37, in _raise_exception_on_finish
raise exc
File "/root/miniconda3/envs/agent/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 32, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.`

chat template for the function/messages

Hi, is there a jinja template somewhere that shows the chat template that should be used. I would like not to rely too much on vllm/fschat to do the formatting of the messages.

Something akin to chat_template in https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer_config.json

Request for provide more train details

when i try to repeat show case in the readme.md, i found cannot repeat the Telecommunications - Customer Support
case
so i want to get more detail for training

Unable to run Medium model

I've tried to use two A6000 with 48 GB of VRAM each.
The command that i was using was
python3 server_vllm.py --model "meetkai/functionary-medium-v2.2" --host 0.0.0.0 --tensor-parallel-size 2

But after downloading everything and loading the first part of the model, it gives the current error.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU 1 has a total capacty of 47.54 GiB of which 215.88 MiB is free. Process 3034125 has 47.32 GiB memory in use. Of the allocated memory 46.89 GiB is allocated by PyTorch, and 9.18 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I'm not very practical of vllm but it seems that besides the --tensor-parallel-size 2 command it still loads the model only on the first one, I've also tried to reduce maximum used memory to 95 with --gpu-memory-utilization 0.95 but it didn't work.

Do you managed to run this model on a machine with that kind of specs or do I need to have a single board with at least 90GB of VRAM to run it?

Work with cuda 11 GPU , as P40 etc , 6.1 arch ?

Hi. Is there any way to run the API also on Pascal GPU (P40) ?
it has TORCH_CUDA_ARCH_LIST=6.1 and supported by torch. It can use float32 .
Unfortunately I don't see a way to run functionary 2 , the other APIs haven't integrated the functions yet.

Successfully installed functionary.
manually reinstall vllm 0.2.7 with cuda11.8 and other requirements .

Downloading https://download.pytorch.org/whl/cu118/xformers-0.0.23.post1%2Bcu118-cp310-cp310-manylinux2014_x86_64.whl (212.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 212.9/212.9 MB 5.8 MB/s eta 0:00:00
Requirement already satisfied: numpy in /opt/conda/envs/func/lib/python3.10/site-packages (from xformers==0.0.23.post1) (1.26.3)
Requirement already satisfied: torch==2.1.2 in /opt/conda/envs/func/lib/python3.10/site-packages (from xformers==0.0.23.post1) (2.1.2+cu118)
Requirement already satisfied: filelock in /opt/conda/envs/func/lib/python3.10/site-packages (from torch==2.1.2->xformers==0.0.23.post1) (3.13.1)
Requirement already satisfied: typing-extensions in /opt/conda/envs/func/lib/python3.10/site-packages (from torch==2.1.2->xformers==0.0.23.post1) (4.9.0)
Requirement already satisfied: sympy in /opt/conda/envs/func/lib/python3.10/site-packages (from torch==2.1.2->xformers==0.0.23.post1) (1.12)
Requirement already satisfied: networkx in /opt/conda/envs/func/lib/python3.10/site-packages (from torch==2.1.2->xformers==0.0.23.post1) (3.2.1)
Requirement already satisfied: jinja2 in /opt/conda/envs/func/lib/python3.10/site-packages (from torch==2.1.2->xformers==0.0.23.post1) (3.1.3)
Requirement already satisfied: fsspec in /opt/conda/envs/func/lib/python3.10/site-packages (from torch==2.1.2->xformers==0.0.23.post1) (2023.12.2)
Requirement already satisfied: triton==2.1.0 in /opt/conda/envs/func/lib/python3.10/site-packages (from torch==2.1.2->xformers==0.0.23.post1) (2.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/func/lib/python3.10/site-packages (from jinja2->torch==2.1.2->xformers==0.0.23.post1) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/envs/func/lib/python3.10/site-packages (from sympy->torch==2.1.2->xformers==0.0.23.post1) (1.3.0)
Installing collected packages: xformers
Attempting uninstall: xformers
Found existing installation: xformers 0.0.23.post1
Uninstalling xformers-0.0.23.post1:
Successfully uninstalled xformers-0.0.23.post1
Successfully installed xformers-0.0.23.post1+cu118

NFO 02-04 18:59:20 server_vllm.py:542] args: Namespace(host='0.0.0.0', port=5000, allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], served_model_name=None, grammar_sampling=True, model='meetkai/functionary-small-v2.2', tokenizer=None, revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir='/ai/models', load_format='auto', dtype='float32', max_model_len=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=2, max_parallel_loading_workers=None, block_size=16, seed=0, swap_space=4, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_paddings=256, disable_log_stats=False, quantization=None, enforce_eager=False, max_context_len_to_capture=8192, engine_use_ray=False, disable_log_requests=False, max_log_len=None)
2024-02-04 18:59:24,783 INFO worker.py:1724 -- Started a local Ray instance.
INFO 02-04 18:59:25 llm_engine.py:70] Initializing an LLM engine with config: model='meetkai/functionary-small-v2.2', tokenizer='meetkai/functionary-small-v2.2', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float32, max_seq_len=32768, download_dir='/ai/models', load_format=auto, tensor_parallel_size=2, quantization=None, enforce_eager=False, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "/ai/functionary/server_vllm.py", line 550, in
engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/ai/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 627, in from_engine_args
engine = cls(
File "/ai/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 344, in init
self.engine = self._init_engine(*args, **kwargs)
File "/ai/functionary/functionary/vllm_monkey_patch/async_llm_engine.py", line 387, in _init_engine
return engine_class(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 114, in init
self._init_cache()
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 262, in _init_cache
num_blocks = self._run_workers(
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 795, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/worker/worker.py", line 96, in profile_num_available_blocks
self.model_runner.profile_run()
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 494, in profile_run
self.execute_model(seqs, kv_caches)
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 453, in execute_model
hidden_states = model_executable(
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 282, in forward
hidden_states = self.model(input_ids, positions, kv_caches,
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 250, in forward
hidden_states, residual = layer(
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 203, in forward
hidden_states = self.self_attn(
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/model_executor/models/mistral.py", line 153, in forward
attn_output = self.attn(q, k, v, k_cache, v_cache, input_metadata)
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/func/lib/python3.10/site-packages/vllm/model_executor/layers/attention.py", line 146, in forward
out = xops.memory_efficient_attention_forward(
File "/opt/conda/envs/func/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 244, in memory_efficient_attention_forward
return _memory_efficient_attention_forward(
File "/opt/conda/envs/func/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 341, in memory_efficient_attention_forward
out, * = op.apply(inp, needs_gradient=False)
File "/opt/conda/envs/func/lib/python3.10/site-packages/xformers/ops/fmha/cutlass.py", line 243, in apply
cls.apply_bmhk(
File "/opt/conda/envs/func/lib/python3.10/site-packages/xformers/ops/fmha/cutlass.py", line 266, in apply_bmhk
out, lse, rng_seed, rng_offset = cls.OPERATOR(
File "/opt/conda/envs/func/lib/python3.10/site-packages/torch/_ops.py", line 692, in call
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

/ai/functionary# python3 server_vllm.py --model "meetkai/functionary-small-v2.2" --tensor-parallel-size 2 --host 0.0.0.0 --port 5000 --dtype auto --download-dir /ai/models

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla P40 GPU has compute capability 6.1.

openai.api_key required?

Hi, great work! I am a bit confused from the readme that openai.api_key is needed to run your model? Thanks!

Non Contaminated Packing into trl

Hi, I'm trying to merge your custom non packing implementation into trl. And I'm facing some issues in getting it to work. is it possible to use the packedDataset class alone without having to merge all your code into trl?

Please look at my PR and sample colab notebook (I know the implementation isn't right without flash attention but I cant get a running training loop)

The error I get is

[/usr/local/lib/python3.10/dist-packages/transformers/trainer.py](https://localhost:8080/#) in compute_loss(self, model, inputs, return_outputs)
   2807         else:
   2808             if isinstance(outputs, dict) and "loss" not in outputs:
-> 2809                 raise ValueError(
   2810                     "The model did not return a loss from the inputs, only the following keys: "
   2811                     f"{','.join(outputs.keys())}. For reference, the inputs it received are {','.join(inputs.keys())}."

ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask.

Models compatibility

I am using https://github.com/abetlen/llama-cpp-python and after changing model from 1.0 to 1.4 I got some high lvl errors like:
Something went wrong: function ' execute_services' does not exist or
Something went wrong: function ' : execute_services' does not exist (when using yaml as configuration in prompt)

this is related log:

llama-cpp-python    | root ::= object 
llama-cpp-python    | object ::= [{] ws object_11 [}] ws 
llama-cpp-python    | value ::= object | array | string | number | value_6 ws 
llama-cpp-python    | array ::= [[] ws array_15 []] ws 
llama-cpp-python    | string ::= ["] string_18 ["] ws 
llama-cpp-python    | number ::= number_19 number_25 number_29 ws 
llama-cpp-python    | value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] 
llama-cpp-python    | ws ::= ws_31 
llama-cpp-python    | object_8 ::= string [:] ws value object_10 
llama-cpp-python    | object_9 ::= [,] ws string [:] ws value 
llama-cpp-python    | object_10 ::= object_9 object_10 | 
llama-cpp-python    | object_11 ::= object_8 | 
llama-cpp-python    | array_12 ::= value array_14 
llama-cpp-python    | array_13 ::= [,] ws value 
llama-cpp-python    | array_14 ::= array_13 array_14 | 
llama-cpp-python    | array_15 ::= array_12 | 
llama-cpp-python    | string_16 ::= [^"\] | [\] string_17 
llama-cpp-python    | string_17 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] 
llama-cpp-python    | string_18 ::= string_16 string_18 | 
llama-cpp-python    | number_19 ::= number_20 number_21 
llama-cpp-python    | number_20 ::= [-] | 
llama-cpp-python    | number_21 ::= [0-9] | [1-9] number_22 
llama-cpp-python    | number_22 ::= [0-9] number_22 | 
llama-cpp-python    | number_23 ::= [.] number_24 
llama-cpp-python    | number_24 ::= [0-9] number_24 | [0-9] 
llama-cpp-python    | number_25 ::= number_23 | 
llama-cpp-python    | number_26 ::= [eE] number_27 number_28 
llama-cpp-python    | number_27 ::= [-+] | 
llama-cpp-python    | number_28 ::= [0-9] number_28 | [0-9] 
llama-cpp-python    | number_29 ::= number_26 | 
llama-cpp-python    | ws_30 ::= [ <U+0009><U+000A>] ws 
llama-cpp-python    | ws_31 ::= ws_30 | 
llama-cpp-python    | system:
llama-cpp-python    | // Supported function definitions that should be called when necessary.
llama-cpp-python    | namespace functions {
llama-cpp-python    | 
llama-cpp-python    |   // Use this function to execute service of devices in Home Assistant.
llama-cpp-python    |   type execute_services = (_: {
llama-cpp-python    |     // 
llama-cpp-python    |     list?: Array<{
llama-cpp-python    |         domain: string,
llama-cpp-python    |         service: string,
llama-cpp-python    |         service_data: {
llama-cpp-python    |           entity_id: string,
llama-cpp-python    |         },
llama-cpp-python    |       }>,
llama-cpp-python    |   }) => any;
llama-cpp-python    | 
llama-cpp-python    | } // namespace functions
llama-cpp-python    | 
llama-cpp-python    | system:
llama-cpp-python    | A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant calls functions with appropriate input when necessary
[...]
llama-cpp-python    | user:
llama-cpp-python    | </s>turn on kanapa light</s>
llama-cpp-python    | assistant: execute_services:
llama-cpp-python    | 
llama-cpp-python    | {
llama-cpp-python    |   "list": [
llama-cpp-python    |     {
llama-cpp-python    |       "domain": "light",
llama-cpp-python    |       "service": "turn_on",
llama-cpp-python    |       "service_data": {
llama-cpp-python    |         "entity_id": "light.kanapa"
llama-cpp-python    |       }
llama-cpp-python    |     }
llama-cpp-python    |   ]
llama-cpp-python    | }
llama-cpp-python    | 
llama-cpp-python    |

The server is overloaded or not ready yet.

Hi! I tried to deploy it on the server, but calling the code in python after deployment doesn't return successfully. What could be the reason for this?

env

new env
install from this repo

code

import openai

openai.api_base = "http://localhost:6123/v1"
openai.api_key = "functionary" # We just need to set this something other than None, so it works with openai package. No API key is required.

response = openai.ChatCompletion.create(
model="musabgultekin/functionary-7b-v1",
messages=[{"role": "user", "content": "What is the weather for Istanbul?"}],
functions=[{
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
},
"required": ["location"],
},
}]
)
print(response)

Server

($ python3 server.py --model "musabgultekin/functionary-7b-v1"
/anaconda3/envs/llD/lib/python3.10/site-packages/pydantic/_internal/_config.py:269: UserWarning: Valid config keys have changed in V2:

'underscore_attrs_are_private' has been removed
warnings.warn(message, UserWarning)
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.92s/it]
INFO: Started server process [119451]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:6123 (Press CTRL+C to quit)
INFO: 127.0.0.1:53557 - "GET / HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:53557 - "GET /favicon.ico HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:53559 - "GET / HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:53559 - "GET /v1 HTTP/1.1" 404 Not Found

Thank you！

Conflict between positional encoding and data packing

@musabgultekin Thank you for your excellent work. I have a question, can you estimate the impact of data packing on model performance compared to not using data packing? Because I think using such data packing conflicts with the model's architecture, specifically Rotary Embedding.

Possibly a trainer with all your needed features already made.

@musabgultekin I was goin threw my email saw what y’all have been up to. I figured I would lyk about axolotl which seems to be why you guys are basically building from scratch. If you choose to use it, all you will have to do is include your custom prompt formatting strategy for the function calls, it handles everything else including only calculating loss on assistant responses, and using target tensors, also it has bin packing implemented, which in my experiments speeds up trainings by atleast 3x.

just figured I would let you know, maybe save you some work.

https://github.com/OpenAccess-AI-Collective/axolotl

Do packing feature work with Unsloth?

Hey, I've found your packing feature for SFT and would like to know, will it work with Unsloth implementation?

https://github.com/unslothai/unsloth

Functionary Has A new Home

Hello everyone,

Due to recent developments, the Functionary project has found a new home. We'll be continuing our work and collaboration at FunctionaryProject/Functionary. We appreciate all the support and contributions here and invite everyone to join us at the new location.

Thank you for your understanding and support!

Best,
The Functionary Project

hello, i had some questions if youve got a sec

im with Alignment Lab AI, were currently operating a pile of open sourced projects and i was reaching out because training a model to do this has been one of my personal goals for a while now, you mentioned you wanted to train a 13b variant, and i believe i have the tools on hand to generate a dataset of arbitrary size to your needs, would you have some spare time to discuss some of this with me in the next few days?

Unable to Inject Function Result

Hello,

I am running the model using the newly provided GGUF tutorial for llama.cpp.

The model properly infers a function call, but when I inject the function result back in the prompt it just repeats the function call.
Am I doing something wrong ?

Sending prompt
[ {'content': 'Current date is 2023-11-28.', 'role': 'system'},
  { 'content': 'Currently connected user is [email protected]',
    'role': 'system'},
  { 'content': 'I wonder if it is going to rain today here in Paris.',
    'role': 'user'},
  {'role': 'assistant'}]
stop_token_ids:  [32002, 32004]
{'role': 'assistant', 'content': None, 'function_call': {'name': 'get_current_weather', 'arguments': '{"location": "Paris, France"}'}}

_This is my input that simulates a function result: _

function name: get_current_weather
function return :{"temperature": "28 C", "rain": "0%"}      

Sending prompt
[ {'content': 'Current date is 2023-11-28.', 'role': 'system'},
  { 'content': 'Currently connected user is [email protected]',
    'role': 'system'},
  { 'content': 'I wonder if it is going to rain today here in Paris.',
    'role': 'user'},
  {'role': 'assistant'},
  { 'content': None,
    'function_call': { 'arguments': '{"location": "Paris, France"}',
                       'name': 'get_current_weather'},
    'role': 'assistant'},
  { 'content': '{"temperature": "28 C", "rain": "0%"}',
    'name': 'get_current_weather',
    'role': 'function'}]
stop_token_ids:  [32002, 32004]
Llama.generate: prefix-match hit
{'role': 'assistant', 'content': 'get_current_weather:\n{"location": "Paris, France"}'}
user:

My code looks like so:

... omitted code from the readme that loads the model and prepares initial prompt etc ...

# Create the prompt to use for inference
while(True):
    print("Sending prompt")
    pp.pprint(messages)

    prompt_str = prompt_template.get_prompt_from_messages(messages, functions)
    token_ids = tokenizer.encode(prompt_str)

    gen_tokens = []
    # Get list of stop_tokens 
    stop_token_ids = [tokenizer.encode(token)[-1] for token in prompt_template.get_stop_tokens_for_generation()]
    print("stop_token_ids: ", stop_token_ids)

    # We use function generate (instead of __call__) so we can pass in list of token_ids
    for token_id in llm.generate(token_ids, temp=0):
        if token_id in stop_token_ids:
            break
        gen_tokens.append(token_id)

    llm_output = tokenizer.decode(gen_tokens)

    # parse the message from llm_output
    result = prompt_template.parse_assistant_response(llm_output)

    print(result)
    messages.append(result)

    if "function_call" in result:
        f_name = input("function name: ")
        f_result = input("function return :")
        messages.append({"role": "function", "name": f_name, "content": f_result})
    else:
        user_input = input("user: ")
        messages.append({"role": "user", "content": user_input})

More detailed description of the data construction process

@musabgultekin Thank you for your excellent work. I would like to refer to the rough process you provided to create my own parallel function call fine-tuning dataset, but the description here is quite simple. Can you provide a more detailed process and some samples of the final dataset.

Is there another model that work in function call?

Hi, cool repo!
But just only musabgultekin/functionary-7b-v1 works nowadays?

Does llama 2 13b or llama 34b work?

It doesn't call the function

I'm running it with this command

python3 server.py --model "meetkai/functionary-7b-v1.1"

then I'm running your chatlab example, and I get this output:

USER: what is the price of the car named Tang?
ASSISTANT:  get_car_price:
{"car_name": "Tang"}

that is, it gets back a json with content instead of function_call

Am I doing anything wrong?

Datasets

Hey all! 👋

I've been using functionary with an on-demand notebook model that allows for the creation and running of jupyter notebook cells. On the small version of the model I've noticed some strange behaviors that I've had to correct. Maybe this is better with the medium model, but I'm curious if you'd be open to some datasets for computational work. What's the format you'd like for future revisions of the model?

Support function_call parameter

Hello 👋,
Thank you for this awesome repo

It looks like the function_call is not implemented (or the model hasn't been finetuned with it)
Example:

import openai

openai.api_key = "" # We just need to set this empty so it works with openai package. No API key is required.
openai.api_base = "http://localhost:8000/v1"

openai.ChatCompletion.create(
    model="musabgultekin/functionary-7b-v1",
    messages=[{"role": "user", "content": "What is the weather for Istanbul?"}],
    function_call="none",
    functions=[{
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
            },
            "required": ["location"],
        },
    }]
)

Give the following result:

{
  "id": "b8d5d347-27ac-43bd-b1ea-49aa99eb4bb6",
  "object": "chat.completion",
  "created": 1691280426.4675407,
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "name": null,
        "function_call": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"Istanbul\"\n}"
        }
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Similar issue with: function_call={ "name": "a_function_name" } in a different use case (forcing the function calling).

Is this an issue you consider to tackle?

template files for LocalAI?

Hello!

I'm trying to run the GGUF version of the medium model, functionary-medium-v2.2-GGUF using LocalAI. Unfortaintly, I do not know how to make the template files to use this with LocalAI. LocalAI typically require 2 (or more files), typcally chat-block.tmpl and chat.tmpl that work with the model.yaml and the gguf file for example here.

it does not seem to work with your typical model settings like:

the "chat-block.tmpl", (this is a template that model “Chat” trained models use, but changed for LocalAI)

<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>

and 'chat.tmpl`,

{{.Input}}
<|im_start|>assistant

and also, not sure of the "stop words" to be used, like

stopwords:
- "user|"
- "assistant|"
- "system|"
- "<|im_end|>"
- "<|im_start|>"
template:
  chat: chat
  chat_message: chat-block

any help to get this running on LocalAI would be great!

Some weights of LlamaForCausalLM were not initialized from the model checkpoint at musabgultekin/functionary-7b-v1 and are newly initialized: ['model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq',

Hello,I'm trying to run with this command:
python3 server.py --model "musabgultekin/functionary-7b-v1"

get this:

Some weights of LlamaForCausalLM were not initialized from the model checkpoint at musabgultekin/functionary-7b-v1 and are newly initialized: ['model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Thanks for any help!

How can this api_server running only with CPU?

Unfortunately, my server has 48 CPUs but no GPU. How can I run your program in this situation?

Modal Server Deploy Error

Trying to deploy to modal and running into an error:

functionary on  main [!] via 🐍 3.11.1 took 6s
➜ modal run modal_server.py
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/kai/.pyenv/versions/3.11.1/bin/modal:8 in <module>                                        │
│                                                                                                  │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/modal/__main__.py:9 in main       │
│                                                                                                  │
│    8 │   setup_rich_traceback()                                                                  │
│ ❱  9 │   entrypoint_cli()                                                                        │
│   10                                                                                             │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/click/core.py:1157 in __call__    │
│                                                                                                  │
│   1156 │   │   """Alias for :meth:`main`."""                                                     │
│ ❱ 1157 │   │   return self.main(*args, **kwargs)                                                 │
│   1158                                                                                           │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/typer/core.py:778 in main         │
│                                                                                                  │
│   777 │   ) -> Any:                                                                              │
│ ❱ 778 │   │   return _main(                                                                      │
│   779 │   │   │   self,                                                                          │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/typer/core.py:216 in _main        │
│                                                                                                  │
│   215 │   │   │   with self.make_context(prog_name, args, **extra) as ctx:                       │
│ ❱ 216 │   │   │   │   rv = self.invoke(ctx)                                                      │
│   217 │   │   │   │   if not standalone_mode:                                                    │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/click/core.py:1688 in invoke      │
│                                                                                                  │
│   1687 │   │   │   │   with sub_ctx:                                                             │
│ ❱ 1688 │   │   │   │   │   return _process_result(sub_ctx.command.invoke(sub_ctx))               │
│   1689                                                                                           │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/click/core.py:1682 in invoke      │
│                                                                                                  │
│   1681 │   │   │   with ctx:                                                                     │
│ ❱ 1682 │   │   │   │   cmd_name, cmd, args = self.resolve_command(ctx, args)                     │
│   1683 │   │   │   │   assert cmd is not None                                                    │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/click/core.py:1729 in             │
│ resolve_command                                                                                  │
│                                                                                                  │
│   1728 │   │   # Get the command                                                                 │
│ ❱ 1729 │   │   cmd = self.get_command(ctx, cmd_name)                                             │
│   1730                                                                                           │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/modal/cli/run.py:170 in           │
│ get_command                                                                                      │
│                                                                                                  │
│   169 │   │   │   tag = function_or_entrypoint.info.get_tag()                                    │
│ ❱ 170 │   │   │   click_command = _get_click_command_for_function(stub, tag)                     │
│   171                                                                                            │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/modal/cli/run.py:130 in           │
│ _get_click_command_for_function                                                                  │
│                                                                                                  │
│   129 │                                                                                          │
│ ❱ 130 │   with_click_options = _add_click_options(f, signature)                                  │
│   131 │   return click.command(with_click_options)                                               │
│                                                                                                  │
│ /Users/kai/.pyenv/versions/3.11.1/lib/python3.11/site-packages/modal/cli/run.py:74 in            │
│ _add_click_options                                                                               │
│                                                                                                  │
│    73 │   │   if parser is None:                                                                 │
│ ❱  74 │   │   │   raise NoParserAvailable(repr(param_type))                                      │
│    75 │   │   kwargs: Any = {                                                                    │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
NoParserAvailable: typing.List[functionary.openai_types.ChatMessage]

Side note: I'm on a mac so I can't use vllm. I'd love to be able to help with the chatlab related documentation to help people prototype and build on top of this.

Challenges with llama.cpp

In attempting to use functionary with llama.cpp server it ignores the function.

./server -m /models/functionary-7b-v2.q4_0.gguf --n-gpu-layers 35 --host 0.0.0.0 --port 11434

Using the sample from the Readme:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="functionary")

s = "Please use a function to find the weather for Kansas City?"

response = client.chat.completions.create(
    model="meetkai/functionary-7b-v2",
    messages=[{"role": "user",
            "content": s}
    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

However the response ignores the function and does not respond with a function call.


ChatCompletion(id='chatcmpl-ZturN7aCaiEp9VoYPaPkkLGUX7IDWNEs', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="To find the weather for Kansas City, you can use a weather API like OpenWeatherMap or Weatherstack. Here's an example using the OpenWeatherMap API in Python:\n\n```python\nimport requests\n\n# Replace 'YOUR_API_KEY' with your actual API key\napi_key = 'YOUR_API_KEY'\ncity = 'Kansas City'\nbase_url = 'http://api.openweathermap.org/data/2.5/weather'\n\nparams = {\n    'q': city,\n    'appid': api_key,\n    'units': 'metric'  # You can change this to 'imperial' for Fahrenheit\n}\n\nresponse = requests.get(base_url, params=params)\ndata = response.json()\n\nif response.status_code == 200:\n    temperature = data['main']['temp']\n    weather_description = data['weather'][0]['description']\n    print(f'The temperature in {city} is {temperature}°C. The weather is {weather_description}.')\nelse:\n    print(f'Failed to retrieve weather data for {city}. Status code: {response.status_code}')\n```\n\nMake sure to sign up for an API key from OpenWeatherMap and replace 'YOUR_API_KEY' with your actual API key. You can also change the units to 'imperial' if you want the temperature in Fahrenheit.\n\nRemember to handle errors and edge cases appropriately in a production environment. This is just a basic example to get you started.\n\nPlease note that you need to comply with the terms of service of the weather API you choose to use.\n\nKeep in mind that the code above is just a basic example and may require additional error handling and security measures for a production environment.\n\n", role='assistant', function_call=None, tool_calls=None))], created=1702586988, model='meetkai/functionary-7b-v2', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=427, prompt_tokens=40, total_tokens=467))

I don't think llama.cpp's server supports the full OpenAI API and is likely ignoring things outside of the messages content.

Also tried using llama-cpp-pyton.

python3 -m llama_cpp.server --model /models/functionary-7b-v2.q4_0.gguf --n_gpu_layers 35 --host 0.0.0.0 --port 11434

Ran the same script as above which is better in that it results in a tool_call but the model behaves strangely.


ChatCompletion(id='chatcmpl-19fdd8fd-82f2-4ae7-b002-9f0b362c39f8', choices=[Choice(finish_reason='tool_calls', index=0, message=ChatCompletionMessage(content=None, role='assistant', function_call=FunctionCall(arguments='{\n  "location": "Kansas City"\n}\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n', name=''), tool_calls=[ChatCompletionMessageToolCall(id='', function=Function(arguments='{\n  "location": "Kansas City"\n}\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n', name=''), type='function')]))], created=1702589814, model='meetkai/functionary-7b-v2', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=1831, prompt_tokens=217, total_tokens=2048))

Not sure of llama-cpp-python supports v2 of Functionary or not.

Is there another way to test functionary? vLLM is not a option due to GPU memory requirements.

Is it possible to be compatible with langchain?

If I want to make a recall of retrieve data from vectordb, how could I make the retrieve func?

Evaluations

Thanks for the library!

Would be interesting to evaluate functionary with e.g. AgentBench. Any plans of such?