Giter Site home page Giter Site logo

ai-jie01 / classgpt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from benthecoder/classgpt

0.0 0.0 0.0 538 KB

ChatGPT for lecture slides

Home Page: https://benneo.super.site/

License: MIT License

Python 23.96% Jupyter Notebook 74.85% Dockerfile 1.19%

classgpt's Introduction

ClassGPT

ChatGPT for my lecture slides

SCR-20230307-isgj

Built with Streamlit, powered by LlamaIndex and LangChain.

Uses the latest ChatGPT API from OpenAI.

Inspired by AthensGPT

App Demo

demo.mp4

How this works

  1. Parses pdf with pypdf
  2. Index Construction with LlamaIndex's GPTSimpleVectorIndex
  3. indexes and files are stored on s3
  4. Query the index
    • uses the latest ChatGPT model gpt-3.5-turbo

Usage

Configuration and secrets

  1. configure aws (quickstart)
    aws configure
  1. create s3 bucket named "classgpt"

  2. rename [.env.local.example] to .env and add your openai credentials

Locally

  1. create python env
    conda create -n classgpt python=3.9
    conda activate classgpt
  1. install dependencies
    pip install -r requirements.txt
  1. run streamlit app
    cd app/
    streamlit run app/01_❓_Ask.py

Docker

Alternative, you can use Docker

    docker compose up

Then open up a new tab and navigate to http://localhost:8501/

TODO

FAQ

Tokens

Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:

  • 1 token ~= 4 chars in English
  • 1 token ~= ¾ words
  • 100 tokens ~= 75 words
  • 1-2 sentence ~= 30 tokens
  • 1 paragraph ~= 100 tokens
  • 1,500 words ~= 2048 tokens

Try the OpenAI Tokenizer tool

Source

Embeddings

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

For text-embedding-ada-002, cost is $0.0004 / 1k tokens or 3000 pages/dollar

Models

For gpt-3.5-turbo model (ChatGPTAPI) cost is $0.002 / 1K tokens

For text-davinci-003 model, cost is $0.02 / 1K tokens

References

Streamlit

Deplyoment

LlamaIndex

Loading data

ChatGPT

Langchain

Boto3

Docker stuff

classgpt's People

Contributors

benthecoder avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.