Topic: llm-inference Goto Github

Some thing interesting about llm-inference

👇 Here are 450 public repositories matching this topic...

anarchy-ai / llm-vm

llm-inference,irresponsible innovation. Try now at https://chat.dev/

artificial-intelligence deep-learning distillation distillation-model llm llm-agent llm-inference llm-local llm-training machine-learning

b4rtaz / distributed-llama

llm-inference,Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.

User: b4rtaz

distributed-computing llama2 llm llm-inference neural-network llms open-llm distributed-llm llama3

beam-cloud / beta9

llm-inference,The Open Serverless GPU Cloud

Organization: beam-cloud

Home Page: https://docs.beam.cloud

gpu ml-platform cuda fine-tuning generative-ai large-language-models llm distributed-computing llm-inference self-hosted

bentoml / bentoml

llm-inference,The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!

Organization: bentoml

Home Page: https://bentoml.com

model-serving mlops llmops generative-ai llm-inference model-inference-service inference-platform deep-learning llm-serving machine-learning

bentoml / openllm

llm-inference,Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.

Organization: bentoml

Home Page: https://bentoml.com

llm llmops model-inference falcon fine-tuning stablelm llm-serving llama mpt vicuna

character-ai / prompt-poet

llm-inference,Streamlines and simplifies prompt design for both developers and non-technical users with a low code approach.

Organization: character-ai

Home Page: https://pypi.org/project/prompt-poet/

prompt-engineering llm llm-inference prompt prompt-design prompt-tuning prompting

databricks / dbrx

llm-inference,Code examples and resources for DBRX, a large language model developed by Databricks

Organization: databricks

Home Page: https://www.databricks.com/

databricks gen-ai generative-ai llm llm-inference llm-training mosaic-ai

deftruth / awesome-llm-inference

llm-inference,📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

User: deftruth

Home Page: https://github.com/DefTruth/Awesome-LLM-Inference

flash-attention flash-attention-2 paged-attention tensorrt-llm vllm awesome-llm sora llm llm-inference llms

dstackai / dstack

llm-inference,A lightweight alternative to Kubernetes for AI, simplifying container orchestration on any cloud or on-premises and accelerating AI development, training, and deployment.

Organization: dstackai

Home Page: https://dstack.ai/docs

aws azure cloud fine-tuning gcp gpu kubernetes llm-inference llm-training llmops llms machine-learning orchestration python training

eastriverlee / llm.swift

llm-inference,LLM.swift is a simple and readable library that allows you to interact with large language models locally with ease for macOS, iOS, watchOS, tvOS, and visionOS.

User: eastriverlee

ios llm llm-inference macos swift tvos visionos watchos gguf

eulersearch / embedding_studio

llm-inference, Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

Organization: eulersearch

Home Page: https://embeddingstud.io/

embeddings embeddings-similarity fine-tuning llm-inference query-parser search-algorithm search-engine semantic-similarity unstructured-data unstructured-search

fasterdecoding / medusa

llm-inference,Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Organization: fasterdecoding

Home Page: https://sites.google.com/view/medusa-llm

llm llm-inference

feifeibear / long-context-attention

llm-inference,Sequence Parallel Attention for Long Context LLM Model Training and Inference

User: feifeibear

attention-is-all-you-need llm-inference llm-training pytorch ring-attention deepspeed-ulysses

flagai-open / aquila2

llm-inference,The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.

Organization: flagai-open

llm llm-inference llm-training

flashinfer-ai / flashinfer

llm-inference,FlashInfer: Kernel Library for LLM Serving

Organization: flashinfer-ai

Home Page: https://flashinfer.ai

flash-attention gpu large-large-models cuda pytorch tvm llm-inference

foldl / chatllm.cpp

llm-inference,Pure C++ implementation of several models for real-time chatting on your computer (CPU)

User: foldl

llm llm-inference

ghimiresunil / llm-powerhouse-a-curated-guide-for-large-language-models-with-custom-training-and-inferencing

llm-inference,LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.

User: ghimiresunil

bert huggingface large-language-models llm-inference llm-training open-source open-source-llm transformers llm-tutorials

hpcaitech / swiftinfer

llm-inference,Efficient AI Inference & Serving

Organization: hpcaitech

Home Page: https://hpc-ai.com/

artificial-intelligence deep-learning gpt inference llama llama2 llm-inference llm-serving

inferflow / inferflow

llm-inference,Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

User: inferflow

llama2 llamacpp llm-inference model-quantization multi-gpu-inference mixture-of-experts moe gemma falcon minicpm

intel / intel-extension-for-transformers

llm-inference,⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Organization: intel

large-language-model chatbot 4-bits llm-inference llm-cpu chatpdf streamingllm intel-optimized-llamacpp speculative-decoding neural-chat

intel / neural-speed

llm-inference,An innovative library for efficient LLM inference via low-bit quantization

Organization: intel

Home Page: https://github.com/intel/neural-speed

cpu fp8 gaudi2 gpu int4 int8 llm-inference low-bit sparsity fp4

internlm / lmdeploy

llm-inference,LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Organization: internlm

Home Page: https://lmdeploy.readthedocs.io/en/latest/

cuda-kernels deepspeed fastertransformer llm-inference turbomind internlm llama llm codellama llama2 llama3

kenza-ai / sagify

llm-inference,LLMs and Machine Learning done easily

Organization: kenza-ai

Home Page: https://kenza-ai.github.io/sagify/

sagemaker generative-ai large-language-model large-language-models llm llm-inference llmops openai ai-gateway anthropic

lean-dojo / leancopilot

llm-inference,LLMs as Copilots for Theorem Proving in Lean

Organization: lean-dojo

Home Page: https://leandojo.org

lean llm-inference machine-learning theorem-proving lean4 formal-mathematics

lightning-ai / litgpt

llm-inference,20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Organization: lightning-ai

Home Page: https://lightning.ai

ai artificial-intelligence deep-learning large-language-models llm llm-inference llms

liguodongiot / llm-action

llm-inference,本项目旨在分享大模型相关技术原理以及实战经验。

User: liguodongiot

Home Page: https://www.zhihu.com/column/c_1456193767213043713

llm llm-inference llm-serving llm-training llmops

liltom-eth / llama2-webui

llm-inference,Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.

User: liltom-eth

llama-2 llama2 llm llm-inference

microsoft / aici

llm-inference,AICI: Prompts as (Wasm) Programs

Organization: microsoft

ai rust wasm wasmtime inference language-model llm llm-framework llm-inference llm-serving

microsoft / autogen

llm-inference,A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

Organization: microsoft

Home Page: https://microsoft.github.io/autogen/

agent-based-framework agent-oriented-programming chat chat-application chatbot chatgpt gpt gpt-35-turbo gpt-4 llm-agent

mistralai / mistral-inference

llm-inference,Official inference library for Mistral models

Organization: mistralai

Home Page: https://mistral.ai/

llm llm-inference mistralai

morpheuslord / hackbot

llm-inference,AI-powered cybersecurity chatbot designed to provide helpful and accurate answers to your cybersecurity-related queries and also do code analysis and scan analysis.

User: morpheuslord

ai chatbot cli-chat-app llama2 automation cybersecurity cybersecurity-education cybersecurity-tools llama2-7b llamacpp

mukel / llama3.java

llm-inference,Practical Llama 3 inference in Java

User: mukel

java llama llama3 llm llm-inference llms

neuralmagic / deepsparse

llm-inference,Sparsity-aware deep learning inference runtime for CPUs

Organization: neuralmagic

Home Page: https://neuralmagic.com/deepsparse/

machinelearning onnx inference computer-vision object-detection pruning quantization pretrained-models nlp cpus

nomic-ai / gpt4all

llm-inference,GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Organization: nomic-ai

Home Page: https://nomic.ai/gpt4all

llm-inference ai-chat

nvidia / generativeaiexamples

llm-inference,Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Organization: nvidia

gpu-acceleration large-language-models llm llm-inference microservice nemo rag retrieval-augmented-generation tensorrt triton-inference-server

openvinotoolkit / openvino

llm-inference,OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

Organization: openvinotoolkit

Home Page: https://docs.openvino.ai

inference deep-learning openvino ai computer-vision diffusion-models generative-ai llm-inference natural-language-processing nlp

predibase / lorax

llm-inference,Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Organization: predibase

Home Page: https://loraexchange.ai

fine-tuning gpt llama llm llm-inference llm-serving llmops lora model-serving pytorch transformers

preternatural-explore / mlx-swift-chat

llm-inference,A multi-platform SwiftUI frontend for running local LLMs with Apple's MLX framework.

Organization: preternatural-explore

ios llm-inference macos mlx swiftui mlx-swift

promptslab / llmtuner

llm-inference,FineTune LLMs in few lines of code (Text2Text, Text2Speech, Speech2Text)

Organization: promptslab

fine-tuning fine-tuning-llm finetune finetune-llm finetuning llm llm-framework llm-inference llm-training llmops

ray-project / ray-educational-materials

llm-inference,This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

Organization: ray-project

deep-learning distributed-machine-learning ray-distributed ray-tune ray ray-train ray-data ray-serve generative-ai llm

ray-project / ray-llm

llm-inference,RayLLM - LLMs on Ray

Organization: ray-project

Home Page: https://aviary.anyscale.com

distributed-systems large-language-models ray serving transformers llm llm-inference llm-serving llmops

rizerphe / local-llm-function-calling

llm-inference,A tool for generating function arguments and choosing what function to call with local LLMs

User: rizerphe

Home Page: https://local-llm-function-calling.readthedocs.io/

chatgpt-functions huggingface-transformers json-schema llm llm-inference openai-function-call openai-functions

rohan-paul / llm-finetuning-large-language-models

llm-inference,LLM (Large Language Model) FineTuning

User: rohan-paul

gpt-3 gpt3-turbo large-language-models llama2 llm llm-finetuning llm-inference llm-serving llm-training mistral-7b

run-ai / genv

llm-inference,GPU environment and cluster management with LLM support

Organization: run-ai

Home Page: https://www.genv.dev

gpu docker gpus nvidia-gpu bash data-science deep-learning jupyter-notebook jupyterlab-extension vscode

safeailab / eagle

llm-inference,Official Implementation of EAGLE-1 and EAGLE-2

Organization: safeailab

Home Page: https://arxiv.org/pdf/2406.16858

large-language-models llm-inference speculative-decoding

sjtu-ipads / powerinfer

llm-inference,High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Organization: sjtu-ipads

falcon large-language-models llama llm llm-inference local-inference bamboo-7b

stoyan-stoyanov / llmflows

llm-inference,LLMFlows - Simple, Explicit and Transparent LLM Apps

User: stoyan-stoyanov

Home Page: https://llmflows.readthedocs.io

ai llm llm-inference llmops llms machine-learning openai prompt-engineering question-answering vector-database

llm-inference,Superduper: Bring AI to your database! Integrate AI models and workflows with your database to implement custom AI applications, without moving your data. Including streaming inference, scalable model hosting, training and vector search.

Organization: superduper-io

Home Page: https://superduper.io

ai chatbot data database distributed-ml inference llm-inference llm-serving llmops ml mlops mongodb pretrained-models python pytorch rag semantic-search torch transformers vector-search

ugorsahin / talkingheads

llm-inference,A library to communicate with ChatGPT, Claude, Copilot, Gemini, HuggingChat, and Pi

User: ugorsahin

chatgpt chatgpt-api selenium browser-automation free python undetected-chromedriver huggingchat llm-inference claude

vectorch-ai / scalellm

llm-inference,A high-performance inference system for large language models, designed for production environments.

Organization: vectorch-ai

Home Page: https://docs.vectorch.com/

cuda inference llm llm-inference model production serving speculative transformer efficiency