llama2-flask-api's Introduction

Llama 2 Flask API

This is a simple HTTP API for the Llama 2 LLM. It is compatible with the ChatGPT API, so you should be able to use it with any application that supports the ChatGPT API, by changing the API URL to http://localhost:5000/chat.

Usage

After installing Llama 2 from the official repo, use app.py instead of example_chat_completion.py with the official example command, or run:

$ ./run_api.sh

After that, you will have a Llama 2 API running at http://localhost:5000/chat

To allow access from the public, use the command:

$ ./run_api.sh 0.0.0.0

You can also set other command line arguments:

$ ./run_api.sh [HOST PORT MODEL MAX_SEQ_LEN MAX_BATCH_SIZE NPROC_PER_NODE]

Notes

This is a very simple implementation and doesn't support all the same features as the ChatGPT API (token usage calculation, streaming, function calling etc.)

If the API is called with streaming=true, it will respond with a single event with the whole response.

Recommend Projects

openmindx / llama2-flask-api Goto Github PK

llama2-flask-api's Introduction

Llama 2 Flask API

Usage

Notes

llama2-flask-api's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent