Topic: vllm Goto Github

Some thing interesting about vllm

👇 Here are 61 public repositories matching this topic...

agnostiqhq / tutorials_covalent_pycon_2024

vllm,

Organization: agnostiqhq

Home Page: https://www.covalent.xyz/

agents ai autonomous-agents chatgpt covalent gpu hpc huggingface large-language-models llama

asprenger / ray_vllm_inference

vllm,A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

User: asprenger

inference llm llm-serving llmops mlops model-serving pytorch ray transformer vllm

atfortes / awesome-llm-reasoning

vllm,Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

User: atfortes

language-models reasoning prompt question-answering in-context-learning chatgpt chain-of-thought prompt-engineering cot awesome

blib-la / ask-poddy

vllm,Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)

Organization: blib-la

ai embedding endpoint infinity llm nextjs openai rag runpod serverless vllm worker

vllm,🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

Organization: bricks-cloud

Home Page: https://trybricks.ai/

golang llm openai ai anthropic azure gpt postgresql rest-api ycombinator api docker privacy security artificial-intelligence generative-ai open-source self-hosted vllm

climatik-project / climatik-project

vllm,Carbon Limiting Auto Tuning for Kubernetes

Organization: climatik-project

keda kepler kubernetes llm llm-inference power-capping kserve vllm kubernetes-operator green-computing

deftruth / awesome-llm-inference

vllm,📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

User: deftruth

Home Page: https://github.com/DefTruth/Awesome-LLM-Inference

flash-attention flash-attention-2 paged-attention tensorrt-llm vllm awesome-llm sora llm llm-inference llms

esmailza / llama2-vllm-langchain-knowledge-graph

vllm,Preserving entities through the integration of knowledge graphs, Llama 2, vLLM, and LangChain.

User: esmailza

distributed information-extraction knowledge-graph langchain llama2 named-entity-recognition python summarization vllm

evilpsycho / open-llm-benchmark

vllm,Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent，格式化输出，指令追随，长文本，多语言，代码，自定义任务的能力基准测试。

User: evilpsycho

evaluation-framework huggingface large-language-models llamacpp llm-agent llms-benchmarking openai vllm

france-travail / benchmark_llm_serving

vllm,A library to benchmark LLMs via their API exposure. For now, it is vLLM oriented

Organization: france-travail

benchmark llm llm-serving vllm

france-travail / happy_vllm

vllm,A REST API for vLLM, production ready

Organization: france-travail

Home Page: https://france-travail.github.io/happy_vllm/

api-rest llm llm-serving production vllm

gameofdimension / vllm-cn

vllm,演示 vllm 对中文大语言模型的神奇效果

User: gameofdimension

vllm

gotzmann / booster

vllm,Booster - open platform for serving LLM models

User: gotzmann

llm chatgpt gpt llama openai llama-cpp ggml exllama vllm llamacpp

gusanmaz / echosight

vllm,EchoSight is a tool that helps visually impaired individuals by audibly describing images taken with a Raspberry Pi Camera or inputted via image path or URL across different operating systems.

User: gusanmaz

coqui-tts llm llms raspberry-pi replicate replicate-api seamlessm4t visual-audio visual-audio-navigation vllm

hcd233 / aris-ai-model-server

vllm,An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

User: hcd233

ai awq embedding fastapi gptq llm mlx openai-compatible-api rag reranker

ineil77 / vllm-code-harness

vllm,Run code inference-only benchmarks quickly using vLLM

User: ineil77

code-generation nlp-machine-learning transformers vllm

ivangabriele / docker-functionary

vllm,Ready-to-deploy Docker image for Functionary LLM served as an OpenAI-Compatible API.

User: ivangabriele

ai docker docker-hub docker-image functions large-language-models llama2 llm openai openai-api

ivangabriele / docker-llm

vllm,Pre-loaded LLMs served as an OpenAI-Compatible API via Docker images.

User: ivangabriele

llm llms api docker openai openai-api openorca orca server vicuna

jasonacox / tinyllm

vllm,Setup and run a local LLM and Chatbot using consumer grade hardware.

User: jasonacox

artificial-intelligence chatbot large-language-models llama-cpp-python llm openai vllm rag retrieval-augmented-generation

joydeb28 / llm-lab

vllm,LLM, Fine Tuning, Llama 2, Gemma, Mixtral, vLLM, LangChain, RAG, ChromaDB, FAISS

User: joydeb28

chromadb faiss finetune-llm gemma genai langchain llama2 llm mixtral nlp openllm rag vllm

julep-ai / standard-chatml

vllm,Standardized spec and vendor-specific transforms for ChatML

Organization: julep-ai

Home Page: https://standard-chatml.org/

anthropic chatml gemini-api litellm openai standard-chatml vllm

kyegomez / simpleunet

vllm,An simple implementation of Unet because all the implementations i've seen are wayy tooo complicated.

User: kyegomez

Home Page: https://discord.gg/qUtxnK2NMf

artificial-intelligence biomedical biomedical-image-processing computer-vision gpt4 image image-classification image-segmentation unet vllm

lklivingstone / sih_2023

vllm,A Large Language Model based tool for generating human like responses to natural language inputs for network not connected over internet.

User: lklivingstone

Home Page: https://scotts-tots.netlify.app/

django linux llm reactjs scylladb vllm

vllm,This repository demonstrates LLM execution on CPUs using packages like llamafile, emphasizing low-latency, high-throughput, and cost-effective benefits for inference and serving.

User: mddunlap924

deepspeed large-language-models llamacpp llamafile llm-inference llm-serving llms vllm

meta-llama / llama-recipes

vllm,Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Organization: meta-llama

ai finetuning langchain llama llama2 llm machine-learning python pytorch vllm

microsoft / vidur

vllm,A large-scale simulation framework for LLM inference

Organization: microsoft

inference llm simulation transformer vllm

navinkumarmnk / ai-learning-platform

vllm,AI-Learning-Platform, a LLM-RAG pipeline which behaves like a guide and able to solve doubts. Deployed on-premise IBM ppc64le architecture. vLLM for model inference & Qdrant with Langchain for RAG Pipeline. Server written in django, postgres & cassandra as the sql & nosql databases.

User: navinkumarmnk

Home Page: https://megnav.com/portfolio/ai-learning-platform

cassandra django langchain llm postgresql ppc64le qdrant ray-distributed vllm

nbasyl / dora

vllm,Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"

User: nbasyl

Home Page: https://arxiv.org/abs/2402.09353

dora lora peft-fine-tuning-llm vllm

netease-media / grps

vllm,【深度学习模型部署框架】支持tensorflow/torch/tensorrt/vllm以及更多nn框架，支持dynamic batching、streaming模式，可限制、可拓展、高性能。帮助用户快速地将模型部署到线上，并通过HTTP/RPC接口方式提供服务。

Organization: netease-media

Home Page: https://zhuanlan.zhihu.com/p/707491462

tensorflow tensorrt torch vllm dynamic-batching serving triton-inference-server

opencsgs / llm-inference

vllm,llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

Organization: opencsgs

deepspeed llama-cpp llm-inference ray transformer vllm

openrlhf / openrlhf

vllm,An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

Organization: openrlhf

Home Page: https://openrlhf.readthedocs.io/

deepspeed transformers vllm large-language-models raylib reinforcement-learning-from-human-feedback reinforcement-learning

phospho-app / fastassert

vllm,Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API provider.

Organization: phospho-app

Home Page: https://phospho.ai

llm llm-inference docker outlines vllm

prometheus-eval / prometheus-eval

vllm,Evaluate your LLM's response with Prometheus and GPT4 💯

Organization: prometheus-eval

evaluation litellm llm llmops python vllm gpt4 llm-as-a-judge llm-as-evaluator

runpod-workers / worker-vllm

vllm,The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Organization: runpod-workers

language-model llm runpod vllm

sasha0552 / vllm-ci

vllm,CI scripts designed to build a Pascal-compatible version of vLLM.

User: sasha0552

nvidia vllm

timesurgelabs / promptproxy

vllm,Call many AIs from a single API.

Organization: timesurgelabs

ai docker huggingface llama llama2 llm openai openai-api openai-api-proxy vllm

trainy-ai / llm-atc

vllm,Fine-tuning and serving LLMs on any cloud

Organization: trainy-ai

Home Page: https://llm-atc.readthedocs.io/en/latest/

finetuning llama2 llms vllm

varunshenoy / super-json-mode

vllm,Low latency JSON generation using LLMs ⚡️

User: varunshenoy

huggingface-transformers llm openai vllm

wangcx18 / llm-vscode-inference-server

vllm,An endpoint server for efficiently serving quantized open-source LLMs for code.

User: wangcx18

llm vscode-extension llm-inference vllm

xorbitsai / inference

vllm,Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Organization: xorbitsai

Home Page: https://inference.readthedocs.io

ggml pytorch chatglm deployment flan-t5 llm wizardlm artificial-intelligence machine-learning whisper

yoziru / nextjs-vllm-ui

vllm,Fully-featured, beautiful web interface for vLLM - built with NextJS.

User: yoziru

nextjs ui vllm openai-api tailwindcss typescript llm-ui vllm-ui llm-webui webui

yy0649 / ice-pixiu

vllm,ICE-PIXIU：A Cross-Language Financial Megamodeling Framework

User: yy0649

Home Page: https://github.com/topics/nlp

nlp internlm large-language-models llama pixiu vllm

zrzrzrzrzrzrzr / lm-fly

vllm,大模型推理框架加速，让 LLM 飞起来

User: zrzrzrzrzrzrzr

llm llm-inference mlx openvino tensorrt-llm tgi vllm