vicgalle / gpt-j-api Goto Github PK

View Code? Open in Web Editor NEW

336.0 9.0 56.0 111 KB

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

Home Page: http://api.vicgalle.net:8000/

License: MIT License

Python 100.00%

gpt-3 gpt gpt-j fastapi language-model api text-generation zero-shot

gpt-j-api's Introduction

gpt-j-api 🦜

An API to interact with the GPT-J language model and variants! You can use and test the model in two different ways:

Streamlit web app at http://api.vicgalle.net:8000/
The proper API, documented at http://api.vicgalle.net:5000/docs

Open API endpoints 🔓

These are the endpoints of the public API and require no authentication. Click on each to see the parameters!

GPT-J text generation 🤖

generate : POST /generate/

Zero-shot text classification (multilingual) 🌍

classify : POST /classify/

Using the API 🔥

Python:

import requests
context = "In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English."
payload = {
    "context": context,
    "token_max_length": 512,
    "temperature": 1.0,
    "top_p": 0.9,
}
response = requests.post("http://api.vicgalle.net:5000/generate", params=payload).json()
print(response)

Python (zero-shot classification):

import requests
payload = { 
    "sequence" : "The movie started slow, but in the end was absolutely amazing!", 
    "labels" : "positive,neutral,negative"}
response = requests.post("http://api.vicgalle.net:5000/classify", params=payload).json()
print(response)

Bash:

curl -X 'POST' \
  'http://api.vicgalle.net:5000/generate?context=In%20a%20shocking%20finding%2C%20scientists%20discovered%20a%20herd%20of%20unicorns%20living%20in%20a%20remote%2C%20previously%20unexplored%20valley%2C%20in%20the%20Andes%20Mountains.%20Even%20more%20surprising%20to%20the%20researchers%20was%20the%20fact%20that%20the%20unicorns%20spoke%20perfect%20English.&token_max_length=512&temperature=1&top_p=0.9' \
  -H 'accept: application/json' \
  -d ''

Deployment of the API server

Just SSH into a TPU VM. This code was tested on both the v2-8 and v3-8 variants.

First, install the requirements and get the weights:

python3 -m pip install -r requirements.txt
wget https://the-eye.eu/public/AI/GPT-J-6B/step_383500_slim.tar.zstd
sudo apt install zstd
tar -I zstd -xf step_383500_slim.tar.zstd
rm step_383500_slim.tar.zstd

And just run

python3 serve.py

Then, you can go to http://localhost:5000/docs and use the API!

Deploy the streamlit dashboard

Just run

python3 -m streamlit run streamlit_app.py --server.port 8000

Contact

If you have a request, I'll be happy to help you at vgallegoalcala at gmail dot com

Sponsors 🦄

Special thanks to the following people, who sponsor this project! <3

Aspie96

Acknowledgements ✨

Many thanks to the support of the TPU Research Cloud, https://sites.research.google/trc/

gpt-j-api's People

Contributors

Stargazers

Watchers

Forkers

willtejeda sd411 thegreenjedi myusernamee trendingtechnology budelius nitinn77 anon3232 abhiram11 jbainpro aryagm rickasricky harish-garg aiwizzard heath123 atlaspilotpuppy kakzaki edenweb1 jesseasamoa standardgalactic tmusvit vishnoor zlapp teeppiphat jleacox sunilkgrao stvnksslr unnuio telefonica davidcdcb simplicitylinux rittee reloadbrain mudiageo morispolanco ryananan sciai-ai vyomkeshj praneybehl hafiz703 mp1962 elijahahianyo nahomhmichael ericechemane sycomix accessitech grouloo d43pan piousbox changchuming chengzhe-feng kp-forks wdshin bhattyuvraj wrkzdev qwang-big

gpt-j-api's Issues

privacy

is any data being saved when using an api?

The api is very slow

Summary
The api takes a really long time to respond

Steps to reproduce
Code snippet:

When performing any call to the api it can take up to a minute

Expected behavior:

Usually it is only a few second wait

Actual behavior:

Very slow

Is this a regression?
That is, did this use to work the way you expected in the past?
yes

On what hardware is the model running?

On what hardware is the model running? On which cloud providers server? And what is the approximate monthly cost (per-person and about 5 calls per minute) for running a service like this?

Thanks,

ValueError: cannot reshape array of size 1 into shape (0,8)

The website is down.

The website doesn't work when you click the link. This also means the Api doesn't work now. Please fix this. Thanks!

Few-shot sentence classification

This can be helpful to define the call

https://fastapi.tiangolo.com/advanced/dataclasses/

Raw text...

This is probably a very stupid question but whenever I run GPT-J I always get the full output:

{'model': 'GPT-J-6B', 'compute_time': 1.2492187023162842, 'text': ' \n(and you\'ll be a slave)\n\n**_"I\'m not a robot, I\'m a human being."_**\n\n**_"I\'m not a robot, I\'m a human being."_**\n\n', 'prompt': 'AI will take over the world ', 'token_max_length': 50, 'temperature': 0.09, 'top_p': 0.9, 'stop_sequence': None}

What parameter do I need to change so it only outputs the generated text?

(and you'll be a slave) I'm not a robot, I'm a human being. I'm not a robot, I'm a human being.

Logprobs?

Is it possible to retrieve logprobs?

No stop token

There is no stop token to inform the model when to stop and move on to the next example

Generate multiple sequences

Thanks for providing this.

Would it be possible to generate multiple sequences with one prompt?

Obviously the amount must be limited.

Transformers' library support

We shall try a few things:

GPT-J from transformers, comparing speed
GPT-Neo and GPT-2 from transformers, so users can also run these models locally
Add to the API the option to choose from these models

api seems offline

When I try to access the API I get the following error: ERR_CONNECTION_TIMED_OUT. But when I try to connect to it using a different IP address it does work. Am I IP banned?

Api Down.

Semantic search

Other than classification and completion, Is it possible to implement semantic search endpoint like they did with GPT-3?

How to use with Javascript instead of python?

Hi, do you have a JS sample?

Thanks :)

Minimum System Requirements

what are minimum system requirements?

is there api to fine tuning this model?

API Still Down

API Down

The API is down. Cannot access the Streamlit dashboard or the API. Can this be looked into?

hi,your server down,can u restart?

API Server Endpoint Down Again

The API Endpoint seems to be down again. ConnectionRefusedError: [Errno 111] Connection refused

Is there a way to get it working on a GPU server?

"Sorry, the public API is limited to around 20 queries per every 30 minutes."

There seems to be a limit on the API now...

AssertionError

For installation cat requirements.txt | xargs -n 1 pip3 install

I don't run intro previous errors with latest
Name: jax Version: 0.3.11
Name: jaxlib Version: 0.3.11
suggestion to update requirements.txt

jax and jaxlib Building from source

was run into this error before getting experience using streamlit run but this is not python3 serve.py and now I get the same error

WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Traceback (most recent call last):
  File "/Users/macos/Desktop/AI/serve.py", line 43, in <module>
    devices = np.array(jax.devices()).reshape(mesh_shape)
ValueError: cannot reshape array of size 1 into shape (0,8)

How to make the api public?

Hey, I was able to get serve.py running with the instructions you gave. But now I want to make the api public and connect it to a domain name so it can be publicly accessed (without needing a connection to the vm). How can I achieve this?

I want to do the same thing you did with "http://api.vicgalle.net:5000/generate" and "http://api.vicgalle.net:5000/docs".

Thanks,

Alternative to Google TPU VM?

Hello,

I would like to run a local instance of GPT-J, but avoid using Google.

I have little to no experience in machine learning and its requirements, are there other solutions I could use? (What are the requirements for a machine in order to run GPT-J?)

Thank you very much!

Unable to round-trip http request to upstream: dial tcp 34.90.75.65:5000: i/o timeout

Zero-shot NER endpoint

Streamlit frontend

With some prompts so people can experiment easily

Version support for Huggingface GPT-J 6B

GPT-J Huggingface and streamlit style like by project-code py

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")

New endpoints to the API

For example, we could make it easy to serve some specific tasks, such as

#11
#23
#40
Dialog between two persons, plus some kind of memory buffer

Illegal Instruction

When installing like described in the readme (fresh conda env, python=3.8, ubuntu) I'll get a illegal instruction immediately after running python serve.py

(gpt-j-api) […]@[…]:/opt/GPT/gpt-j-api$ python -q -X faulthandler serve.py
Fatal Python error: Illegal instruction

Current thread 0x00007f358d7861c0 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1166 in create_module
  File "<frozen importlib._bootstrap>", line 556 in module_from_spec
  File "<frozen importlib._bootstrap>", line 657 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1042 in _handle_fromlist
  File "/home/korny/miniconda3/envs/gpt-j-api/lib/python3.8/site-packages/jaxlib/xla_client.py", lin
e 31 in <module>
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1042 in _handle_fromlist
  File "/home/korny/miniconda3/envs/gpt-j-api/lib/python3.8/site-packages/jax/lib/__init__.py", line 58 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 843 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1042 in _handle_fromlist
  File "/home/korny/miniconda3/envs/gpt-j-api/lib/python3.8/site-packages/jax/config.py", line 26 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 843 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "/home/korny/miniconda3/envs/gpt-j-api/lib/python3.8/site-packages/jax/__init__.py", line 33 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 843 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "serve.py", line 3 in <module>
Illegal instruction (core dumped)

EDIT

running this on CPU only, I tried installing jax[CPU] - same resut

Improve documentation

API in fastapi docs
API in gitbook

Github Action for checking the API

With https://docs.github.com/es/actions/reference/events-that-trigger-workflows#pull_request_review
the workflow can be triggered every few minutes

Is there a way to speed up inference?

Hello, I am currently working on a project where I need quick inference. It needn't be real-time, but something around 7-10 sec would be great. Is there a way to speed up the inference using the API?

The model does not seem to be a problem as compute_time is around 8sec, but by the time the request arrives it takes around 20 seconds (over 30 on some occasions). Is there a way to make the request a bit faster?

Thanks,

Usage

I'm using this to host a Discord chatbot, and though I have slowmode on the channel there's still a lot of usage, and often the API is being used as fast as it can generate completions. Will this harm the experience for others? Should I limit it more? (thanks for making this free but I don't want to take advantage of that too much if it's bad for others)

I miss vic's api :,)

API server endpoint down.

Your API endpoint seems to be down. (Err, connection refused). It seems like it is not running anymore for now.

Latency with TPU VM

Got things running on Google Clouds, really happy :). Was hoping for a little but of a speed increase, but computation time is the same and latency on the request seems to be the main delay. Did you experiment with firewalls and ports to improve things?

API VM?

Hi I wanted to host my own version of the api, where is the public one hosted? is it on a google cloud TPU VM? The ones ive seen here https://cloud.google.com/tpu/pricing are very expensive :D Is a TPU VM needed and the model won't be able to run on a normal GPU VM?

Thanks!

422 mistake

i'm getting 422 error when trying to fetch from my node server

do you know what might be the reason?

I've made an extensions using this api

https://chrome.google.com/webstore/detail/type-j/femdhcgkiiagklmickakfoogeehbjnbh

You can check it out here

First i was very hyped up and it felt fun, like I was talking to a machine, but then I lost my enthusiasm and now I feel like it's totally useless xD

I'm just leaving a link here for you to appreciate you, it became real thanks for you posting this api

feel free to delete the issue as it's out of scope

if you got ideas on how to make it commercially succesful - i'll be happy to partner up

peace

vicgalle / gpt-j-api Goto Github PK

gpt-j-api's Introduction

gpt-j-api 🦜

Open API endpoints 🔓

GPT-J text generation 🤖

Zero-shot text classification (multilingual) 🌍

Using the API 🔥

Deployment of the API server

Deploy the streamlit dashboard

Contact

Sponsors 🦄

Acknowledgements ✨

gpt-j-api's People

Contributors

Stargazers

Watchers

Forkers

gpt-j-api's Issues

EDIT

Recommend Projects

Recommend Topics

Recommend Org