Giter Site home page Giter Site logo

shishirpatil / gorilla Goto Github PK

View Code? Open in Web Editor NEW
10.1K 99.0 773.0 188.05 MB

Gorilla: An API store for LLMs

Home Page: https://gorilla.cs.berkeley.edu/

License: Apache License 2.0

Python 95.38% Shell 0.22% C++ 0.13% JavaScript 3.44% Rust 0.51% Scheme 0.32%
api llm api-documentation chatgpt gpt-4-api claude-api openai-api openai-functions

gorilla's People

Contributors

amiraflak avatar aryanvichare avatar benjaminhuo avatar charliejcj avatar dangeo773 avatar danielfleischer avatar danielskry avatar eitanturok avatar eltociear avatar fanjia-yan avatar hannesgith avatar huanzhimao avatar jasonzhu1313 avatar joedevon avatar kaiwen129 avatar meenakshi-mittal avatar morganmcg1 avatar mzamini92 avatar noppapon avatar rajveer43 avatar ramanv0 avatar ricklamers avatar royh02 avatar saikolasani avatar shawnharmsen avatar shishirpatil avatar tanmaydoesai avatar tianjunz avatar uponthesky avatar viniciuslazzari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gorilla's Issues

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f912081fb50>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

Train with mpt 8k

Is the feature request related to a problem?

Would it be expensive to train with mpt 8k? Can you provide an mpt 8k model?

Describe the solution you'd like
When I run gorilla, I want to see an 8k context window.

Prefer to keep Apache 2 licensing.

Additional context
Add any other context or screenshots about the feature request here.

https://huggingface.co/mosaicml/mpt-7b-8k

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: I would like to translate 'I feel very good today.' from English to Chinese

deploying to replicate

Describe the solution you'd like
I would love to see a model of Gorilla hosted to Replicate, it would be nice to be able to utilize their API and hosting.
Additional context
Had a blast playing with the colab

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: What is the ...

How to run this project?

Describe the issue

I saw the scene described in the video, which seems to be running on the command line and obtaining API access methods through dialogue. But I didn't find where to run it to get such results. Do I need to train first or do I need to run a specific Python file? Please advise..

[feature] Run gorilla locally without GPUs 🦍

Today, Gorilla end-points run on UC Berkeley hosted servers 🐻 When you try our colab, or our chat completion API, or the CLI tool, it hits our GPUs for inference. A popular ask among our users is to run Gorilla locally on Macbooks/Linux/WSL.

Describe the solution you'd like:
Have the model(s) running locally on MPS/CPU/GPU and listening to a port. All the current gorilla end-points can then just hit localhost to get the response to any given prompt.

Additional context:
Here is an application that would immediately use it: https://github.com/gorilla-llm/gorilla-cli
Given, we have LLaMA models, these should be plug-and-play: ggerganov/llama.cpp and karpathy/llama2.c
Also relevant: https://huggingface.co/TheBloke/gorilla-7B-GPTQ

Update 1: If you happen to have an RTX, or V100 or A100 or H100, you can use Gorilla today without any latency hit. The goal of this enhancement is to help those who may not have access to and greatest GPUs.

The provided response file test results are not consistent with the paper[bug] Hosted Gorilla: <Issue>

Describe the bug

We use the file /eval/eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl and then use the code /eval/eval-scripts/ast_eval_th.py to calculate the metrics The final calculated result is Final Functionality accuracy: 75.80 Final hallucination: 16.12, which is the same as the final Functionality accuracy of zero-shot of torchhub published in Table1 of the paper. 59.13 Final hallucination: 6.98 is a big difference

To Reproduce
Steps to reproduce the behavior:

  1. We use the file /eval/eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl and then use the code /eval/eval-scripts/ast_eval_th.py to calculate the metrics

Screenshots
None

Proposed Solution
None

Additional context
We would like to know why there is a large discrepancy with the original published results, whether it is because an update was made or we compared the wrong table.

load-8bit flag doesn't work

Describe the issue
When I use the --load-8bit flag it's returning a load_compress_model that's not imported anywhere (and for that reason -I guess- it's failing?).

Any ideas on how to go about this issue? I've searched for this obj in the code itself and in hugging face's API but couldn't find it, so I'm kind of clueless on what to do.

I'm running this on a one GPU machine. It's an old T420 with archlinux.

Thanks!

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2302 tokens (1790 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

Is there any way to just cut the completion / request to the first 2048 tokens?

GPT4 cutoff date is September 2021 - how did this impact evals?

Any new API info would not be in GPT4 training.

How much impact do you think this has with respect to relative performance between GPT4 and Gorilla?

Did you do any eval on APIs that existed prior to 09/21 versus those after?

I reviewed the paper but could not find any discussion on this. https://arxiv.org/abs/2305.15334

To be clear, I am not saying this invalidates the ideas, which I think were a fantastic contribution to OS LLMs, but rather that it would be good to understand the precise reason for the superior performance.

License?

Hello, thanks for making your work available! Have you chosen a license yet?

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8f4f57fc10>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

eval resutls

Hi, thanks for your excellent work.

I ran the eval-scrip

python ast_eval_th.py --api_dataset ../../data/api/torchhub_api.jsonl --apibench ../../data/apibench/torchhub_eval.json --llm_responses ../eval-data/responses/torchhub/response_torchhub_Gorilla_FT_0_shot.jsonl

and get the results:

Final Functionality accuracy:  0.7580645161290323
Final hallucination:  0.16129032258064516

I find these results are inconsistent with the results reported in the paper.

image

I would like to ask where I got it wrong.

Thanks.

De-duplicate APIBench eval data (?)

The evaluation data for APIBench is duplicated between data/apibench/*_eval.json and eval/eval-data/questions/. I think the only difference is formatting. Maybe we should just keep the eval/eval-data/responses and have data/apibench for only data used to train the model.

Initially we made two copies with the following rationale:
apibench should have all the data self-contained, which the community is using to train/benchmark their LLMs.
eval/ would have the eval data in a format that would be easy to eyeball and understand what is going on.

Maybe this is one of those few cases where it might be ok to have the same data twice in the repository in different formats?

Starting this issue in case anyone has comments on this.

what are the document retrievers mentioned in your paper?

Hi!

thanks for the wonderful work! During reading your paper, I'm confused about the document retrievers mentioned in your paper. You mentioned several of them, such as gpt and oracle. I cannot find more specific reference or hyperlinks in your paper. I'm wondering where can I find websites or illustrations of these retrievers?

Thank you.

Gorilla Self-Hosted

Hi,

is it also possible to self host Gorilla with an API that is compatible with the OpenAI chat completion API?
So essentially the same as depicted in the Colab?

[feature] FOOM detection.

This seems like the sort of project that could accidentally produce a self-improving superhuman system. Does anyone on the project have an understanding of AI Alignment? Are there efforts to measure the potential for systems built with gorilla to FOOM?

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f0ec18dabf0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

The bm25 and gpt-index scripts ?

          For the different retrievers, we use bm25 (https://en.wikipedia.org/wiki/Okapi_BM25), gpt-index simply uses `Davinci v1` from OpenAI to embed all the documents and do simple cosine similarity match during the inference time. For oracle, we just provided the golden truth answer to Gorilla. Hope this helps and let me know if there are any further questions!

Originally posted by @tianjunz in #21 (comment)

Would you be willing to release the bm25 and gpt-index scripts to help the community reproduce the experimental results?

Leveraging Llama 2

I don’t see any existing discussion about leveraging Meta’s new Llama 2 model. Curious if you guys have any plans in the making for using this new base model in gorilla.

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f455ea077f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

When applying these deltas to these base weights I get the following error:

$ python apply_delta.py --base-model-path ../../llama-7b-hf/ --target-model-path ../../gorilla-7b-hf-v0/ --delta-path ../../gorilla-7b-hf-delta-v0/
Loading the delta weights from ../../gorilla-7b-hf-delta-v0/
Traceback (most recent call last):
  File "/home/paperspace/projects/gorilla/gorilla/inference/apply_delta.py", line 167, in <module>
    apply_delta(args.base_model_path, args.target_model_path, args.delta_path)
  File "/home/paperspace/projects/gorilla/gorilla/inference/apply_delta.py", line 129, in apply_delta
    delta_tokenizer = AutoTokenizer.from_pretrained(delta_path, use_fast=False)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 702, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
    return cls._from_pretrained(
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/paperspace/.local/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/paperspace/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/home/paperspace/.local/lib/python3.9/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 

Specs:

$ nvidia-smi
Thu Jun  1 17:50:22 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro M4000        Off  | 00000000:00:05.0  On |                  N/A |
| 46%   32C    P8    16W / 120W |    189MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1532      G   /usr/lib/xorg/Xorg                121MiB |
|    0   N/A  N/A      2011      G   /usr/bin/gnome-shell               59MiB |
|    0   N/A  N/A      2571      G   ...bexec/gnome-initial-setup        2MiB |
+-----------------------------------------------------------------------------+
$ LC_ALL=C lspci -v | grep -EA10 "3D|VGA" | grep 'prefetchable' 
	Memory at f4000000 (32-bit, prefetchable) [size=8M]
	Memory at f3000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0000000 (64-bit, prefetchable) [size=32M]
$ free -h
              total        used        free      shared  buff/cache   available
Mem:           29Gi       1.2Gi       5.6Gi        13Mi        22Gi        27Gi
Swap:            0B          0B          0B

Augmenting additional API to the Gorilla-LLm

I hope you are doing well, a great thanks for this work.
Is it possible to add additional APIs(private APIs) to Gorilla? We have a large database of APIs and we need to add them to Gorilla, How can we do this? Should we fine-tune the Gorilla LLM? or something like this?

[bug] Hosted Gorilla: <Issue>

Exception: Invalid response object from API: '{"object":"error","message":"NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\n\n(CUDA error: uncorrectable ECC error encountered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n)","code":50001}' (HTTP response code was 400)
Failed model: gorilla-7b-hf-v1, for prompt: I would like to translate 'I feel very good today.' from English to Chinese

Encountered 1 file(s) that may not have been copied correctly on Windows

I encounter this problem downloading model weights. Seems weights larger than 4 GB are not correctly handled on Windows. Do you upload the models from windows system?

root@4bd793bb2ded:/workspace/gorilla# git lfs install
Updated git hooks.
Git LFS initialized.

root@4bd793bb2ded:/workspace/gorilla# git clone https://huggingface.co/gorilla-llm/gorilla-mpt-7b-hf-v0
Cloning into 'gorilla-mpt-7b-hf-v0'...
remote: Enumerating objects: 35, done.
remote: Counting objects: 100% (35/35), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 35 (delta 5), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (35/35), 621.68 KiB | 1.84 MiB/s, done.
Filtering content: 100% (2/2), 4.38 GiB | 57.36 MiB/s, done.
Encountered 1 file(s) that may not have been copied correctly on Windows:
        pytorch_model-00001-of-00002.bin

See: `git lfs help smudge` for more details.
root@4bd793bb2ded:/workspace/gorilla/gorilla-mpt-7b-hf-v0# ls -al
total 12989212
drwxr-xr-x 3 root root       4096 Jun  7 00:17 .
drwxr-xr-x 8 root root        161 Jun  7 00:16 ..
drwxr-xr-x 9 root root        174 Jun  7 00:18 .git
-rw-r--r-- 1 root root       1477 Jun  7 00:16 .gitattributes
-rw-r--r-- 1 root root       2068 Jun  7 00:16 README.md
-rw-r--r-- 1 root root       1752 Jun  7 00:16 adapt_tokenizer.py
-rw-r--r-- 1 root root      16818 Jun  7 00:16 attention.py
-rw-r--r-- 1 root root       2493 Jun  7 00:16 blocks.py
-rw-r--r-- 1 root root       1284 Jun  7 00:16 config.json
-rw-r--r-- 1 root root       9080 Jun  7 00:16 configuration_mpt.py
-rw-r--r-- 1 root root      28182 Jun  7 00:16 flash_attn_triton.py
-rw-r--r-- 1 root root        112 Jun  7 00:16 generation_config.json
-rw-r--r-- 1 root root      27219 Jun  7 00:16 hf_prefixlm_converter.py
-rw-r--r-- 1 root root       3639 Jun  7 00:16 meta_init_context.py
-rw-r--r-- 1 root root      17406 Jun  7 00:16 modeling_mpt.py
-rw-r--r-- 1 root root       2563 Jun  7 00:16 norm.py
-rw-r--r-- 1 root root      12558 Jun  7 00:16 param_init_fns.py
-rw-r--r-- 1 root root 9943040275 Jun  7 00:18 pytorch_model-00001-of-00002.bin
-rw-r--r-- 1 root root 3355599187 Jun  7 00:17 pytorch_model-00002-of-00002.bin
-rw-r--r-- 1 root root      16023 Jun  7 00:16 pytorch_model.bin.index.json
-rw-r--r-- 1 root root        129 Jun  7 00:16 special_tokens_map.json
-rw-r--r-- 1 root root    2113738 Jun  7 00:16 tokenizer.json
-rw-r--r-- 1 root root        264 Jun  7 00:16 tokenizer_config.json

[bug] Testing Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7bba974da140>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-tf-v0, for prompt: I want to build a robot that can detecting objects in an image

[bug] Hosted Gorilla: <Issue>

Exception: Error communicating with OpenAI: HTTPConnectionPool(host='34.132.127.197', port=8000): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa8bdf53c40>: Failed to establish a new connection: [Errno 111] Connection refused'))
Failed model: gorilla-7b-hf-v0, for prompt: I would like to translate from English to Chinese

The returned results show garbled content?

The running command used is:
python3 serve/gorilla_cli.py --model-path model/gorilla-7b-th-v0/

But the returned results show garbled content
image

How did this problem arise and how should it be resolved?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.