Giter Site home page Giter Site logo

pst2016 / db-gpt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eosphoros-ai/db-gpt

0.0 0.0 0.0 163.72 MB

Revolutionizing Database Interactions with Private LLM Technology

Home Page: https://db-gpt.readthedocs.io

License: MIT License

Shell 0.66% JavaScript 0.01% Python 71.08% HTML 27.92% Mako 0.04% Batchfile 0.09% Dockerfile 0.21%

db-gpt's Introduction

DB-GPT: Revolutionizing Database Interactions with Private LLM Technology

What is DB-GPT?

DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.

Contents

DB-GPT Youtube Video

Demo

Run on an RTX 4090 GPU.

Chat Excel

excel

Chat Plugin

auto_plugin_new

LLM Management

llm_manage

FastChat && vLLM

vllm

Trace

trace_new

Chat Knowledge

kbqa_new

Install

Docker Linux macOS Windows

Usage Tutorial

Features

Currently, we have released multiple key features, which are listed below to demonstrate our current capabilities:

  • Private KBQA & data processing

    The DB-GPT project offers a range of features to enhance knowledge base construction and enable efficient storage and retrieval of both structured and unstructured data. These include built-in support for uploading multiple file formats, the ability to integrate plug-ins for custom data extraction, and unified vector storage and retrieval capabilities for managing large volumes of information.

  • Multiple data sources & visualization

    The DB-GPT project enables seamless natural language interaction with various data sources, including Excel, databases, and data warehouses. It facilitates effortless querying and retrieval of information from these sources, allowing users to engage in intuitive conversations and obtain insights. Additionally, DB-GPT supports the generation of analysis reports, providing users with valuable summaries and interpretations of the data.

  • Multi-Agents&Plugins

    It supports custom plug-ins to perform tasks, natively supports the Auto-GPT plug-in model, and the Agents protocol adopts the Agent Protocol standard.

  • Fine-tuning text2SQL

    An automated fine-tuning lightweight framework built around large language models, Text2SQL data sets, LoRA/QLoRA/Pturning, and other fine-tuning methods, making TextSQL fine-tuning as convenient as an assembly line. DB-GPT-Hub

  • Multi LLMs Support, Supports multiple large language models, currently supporting

    Massive model support, including dozens of large language models such as open source and API agents. Such as LLaMA/LLaMA2, Baichuan, ChatGLM, Wenxin, Tongyi, Zhipu, etc.

    Etc.

  • Privacy and security

    The privacy and security of data are ensured through various technologies, such as privatized large models and proxy desensitization.

  • Support Datasources

DataSource support Notes
MySQL Yes
PostgreSQL Yes
Spark Yes
DuckDB Yes
Sqlite Yes
MSSQL Yes
ClickHouse Yes
Oracle No TODO
Redis No TODO
MongoDB No TODO
HBase No TODO
Doris No TODO
DB2 No TODO
Couchbase No TODO
Elasticsearch No TODO
OceanBase No TODO
TiDB No TODO
StarRocks No TODO

Introduction

The architecture of the entire DB-GPT is shown.

The core capabilities mainly consist of the following parts:

  1. Multi-Models: Support multi-LLMs, such as LLaMA/LLaMA2、CodeLLaMA、ChatGLM, QWen、Vicuna and proxy model ChatGPT、Baichuan、tongyi、wenxin etc
  2. Knowledge-Based QA: You can perform high-quality intelligent Q&A based on local documents such as PDF, word, excel, and other data.
  3. Embedding: Unified data vector storage and indexing, Embed data as vectors and store them in vector databases, providing content similarity search.
  4. Multi-Datasources: Used to connect different modules and data sources to achieve data flow and interaction.
  5. Multi-Agents: Provides Agent and plugin mechanisms, allowing users to customize and enhance the system's behavior.
  6. Privacy & Secure: You can be assured that there is no risk of data leakage, and your data is 100% private and secure.
  7. Text2SQL: We enhance the Text-to-SQL performance by applying Supervised Fine-Tuning (SFT) on large language models

RAG-IN-Action

SubModule

  • DB-GPT-Hub Text-to-SQL performance by applying Supervised Fine-Tuning (SFT) on large language models.
  • DB-GPT-Plugins DB-GPT Plugins Can run autogpt plugin directly
  • DB-GPT-Web ChatUI for DB-GPT

Image

🌐 AutoDL Image

Language Switching

In the .env configuration file, modify the LANGUAGE parameter to switch to different languages. The default is English (Chinese: zh, English: en, other languages to be added later).

Contribution

  • Please run black . before submitting the code. Contributing guidelines, how to contribute

RoadMap

KBQA RAG optimization

  • Multi Documents

    • PDF
    • Excel, CSV
    • Word
    • Text
    • MarkDown
    • Code
    • Images
  • RAG

  • Graph Database

    • Neo4j Graph
    • Nebula Graph
  • Multi-Vector Database

    • Chroma
    • Milvus
    • Weaviate
    • PGVector
    • Elasticsearch
    • ClickHouse
    • Faiss
  • Testing and Evaluation Capability Building

    • Knowledge QA datasets
    • Question collection [easy, medium, hard]:
    • Scoring mechanism
    • Testing and evaluation using Excel + DB datasets

Multi Datasource Support

  • Multi Datasource Support
    • MySQL
    • PostgreSQL
    • Spark
    • DuckDB
    • Sqlite
    • MSSQL
    • ClickHouse
    • Oracle
    • Redis
    • MongoDB
    • HBase
    • Doris
    • DB2
    • Couchbase
    • Elasticsearch
    • OceanBase
    • TiDB
    • StarRocks

Multi-Models And vLLM

Agents market and Plugins

  • multi-agents framework
  • custom plugin development
  • plugin market
  • Integration with CoT
  • Enrich plugin sample library
  • Support for AutoGPT protocol
  • Integration of multi-agents and visualization capabilities, defining LLM+Vis new standards

Cost and Observability

Text2SQL Finetune

  • support llms

    • LLaMA
    • LLaMA-2
    • BLOOM
    • BLOOMZ
    • Falcon
    • Baichuan
    • Baichuan2
    • InternLM
    • Qwen
    • XVERSE
    • ChatGLM2
  • SFT Accuracy

As of October 10, 2023, by fine-tuning an open-source model of 13 billion parameters using this project, the execution accuracy on the Spider evaluation dataset has surpassed that of GPT-4!

name Execution Accuracy reference
GPT-4 0.762 numbersstation-eval-res
ChatGPT 0.728 numbersstation-eval-res
CodeLlama-13b-Instruct-hf_lora 0.789 sft train by our this project,only used spider train dataset ,the same eval way in this project with lora SFT
CodeLlama-13b-Instruct-hf_qlora 0.774 sft train by our this project,only used spider train dataset ,the same eval way in this project with qlora and nf4,bit4 SFT
wizardcoder 0.610 text-to-sql-wizardcoder
CodeLlama-13b-Instruct-hf 0.556 eval in this project default param
llama2_13b_hf_lora_best 0.744 sft train by our this project,only used spider train dataset ,the same eval way in this project

More Information about Text2SQL finetune

Licence

The MIT License (MIT)

Contact Information

We are working on building a community, if you have any ideas for building the community, feel free to contact us.

Star History Chart

db-gpt's People

Contributors

aries-ckt avatar csunny avatar yhjun1026 avatar fangyinc avatar xuyuan23 avatar joecryptotoo avatar yihong0618 avatar zhanghy-sketchzh avatar qutcat1997 avatar xudafeng avatar lbypatrick avatar wangzaistone avatar oushu1zhangxiangxuan1 avatar younisba avatar sheri528 avatar quqibing avatar cm-liushaodong avatar isadba avatar rinne1998 avatar yjmm10 avatar thebigbone avatar hpc369 avatar eltociear avatar huangzhuxing avatar alphahinex avatar yiqijiu avatar ssw1999 avatar sbabybird avatar nobunagaaa avatar lozzo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.