Giter Site home page Giter Site logo

unicorn's Introduction

Unicorn

Unicorn is a semantic-search enhanced Telegram chatbot. It uses OCR, pre-trained language models and FAISS to retrieve relevant passes from documents and generate answers to questions.

Installation

Clone the Unicorn repository:

git clone https://github.com/marcodsn/unicorn.git

Install the required packages:

pip install -r requirements.txt

Usage

Unicorn can be used to answer questions using factual data retrieved from PDF documents. It can also be used to answer questions without using any documents, but the answers will be less accurate.

To start the bot, simply run:

python main.py token

Where token is the Telegram bot token. You can get a token by talking to @BotFather on Telegram.

FAQ

  • Why is Unicorn so slow? Currently, the default LLM used is the Metharme-13b model by PygmalionAI (merged by TehVenom) which has to run in 4bit precision to fit on the average consumer GPU. If you prefer speed to accuracy, you can change the LLM model to a smaller one manually (check models/LLMs/metharme.py).
  • Why is Unicorn inaccurate? Unicorn is still in its early stages of development and for this reason it is not flawless. If you find any bugs, please report them in the Issues section of this repository.
  • What are the VRAM requirements for Unicorn? At the moment, Unicorn requires just over 8GB of VRAM to run the embedding model and the LLM model to generate answers. This can be reduced by changing the LLM model to a smaller one manually (in models/LLMs). Document analysis is also VRAM-intensive at the moment, so you will need more VRAM if you plan to use that feature.
  • Does Unicorn support multiple languages? Unicorn works best with English, but it can be used with other languages as well (although the results will be less accurate).
  • Can I modify Unicorn's personality? Yes, you can! You can change the persona by modifying the character variable in main.py.

Roadmap

  • Move to a better OCR model
  • Add support for multiple languages
  • Add support for more document types
  • Add support to use multiple documents at once
  • Allow different characters (personas) for every user
  • Add support for a default, always-loaded document (e.g. with information about the company/organization)
  • Add more easily accessible settings

... And more to come!

License

Unicorn is licensed under the MIT License. See the LICENSE file for more information.

unicorn's People

Contributors

marcodsn avatar

Stargazers

 avatar Robert P. avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.