Giter Site home page Giter Site logo

oneryalcin / vectorsearch-applications Goto Github PK

View Code? Open in Web Editor NEW

This project forked from webermatias/vectorsearch-applications

0.0 0.0 0.0 228.1 MB

Vector Search applications repo for Uplimit course

Shell 0.01% Python 1.11% Jupyter Notebook 98.89%

vectorsearch-applications's Introduction

Welcome to Vector Search Applications with LLMs

This is the course repository for Vector Search Applications with LLMs taught by Chris Sanchez with assistance from Matias Weber. The course is desgined to teach search and discovery industry best practices culminating in a demo Retrieval Augmented Generation (RAG) application. Along the way students will learn all of the components of a RAG system to include data preprocessing, embedding creation, vector database selection, indexing, retrieval systems, reranking, retrieval evaluation, question answering through an LLM and UI implementation through Streamlit.

Prerequisites - Technical Experience

Students are expected to have the following technical skills prior to enrolling. Students who do not meet these prerequisites will likely have an overly challenging learning experience:

  • Minimum of 1-year experience coding in Python. Skillsets should include programming using OOP, dictionary and list comprehensions, lambda functions, setting up virtual environments, comfortability with git version control.
  • Professional or academic experience working with search engines.
  • Ability to comfortably navigate the command line to include familiarity with docker.
  • Nice to have but not strictly required:
    • experience fine-tuning a ML model
    • familiarity with the Streamlit API
    • familiarity with making inference calls to a Generative LLM (OpenAI or Llama-2)

Prerequisites - Administrative

  1. Students will need access to their own compute environment, whether locally or remote. There are no hard requirements for RAM or CPU processing power, but in general the more punch the better.
  2. Students will need accounts with the following organizations:
    • Either an OpenAI account (RECOMMENDED) or a HuggingFace account. Students have the option of either using a paid LLM service (OpenAI) or using the open source meta-llama/Llama-2-7b-chat-hf model. Students choosing the latter option will first need to register with Meta to request access to the Llama-2 model.
    • An account with weaviate.io. The current iteration of this course will use Weaviate as a sparse and dense vector database. Weaviate offers free cloud instance cluster resources for 21 days (as of November 2023). Students are advised to NOT CREATE a Weaviate cloud cluster until the course officially starts.
    • A standard Github account in order to fork this repo, clone a copy, and submit commits to the fork as needed throughout the course.

Setup

  1. Fork this course repo (see upper right hand corner of the repo web page). fork button
  2. Clone a copy of the forked repo into the dev environment of your choice. Navigate into the cloned vectorsearch-applications directory.
  3. Create a python virtual environment using your library of choice. Here's an example using conda:
conda create --name impactenv -y python=3.10
  1. Once the environment is created, activate the environment and install dependencies.
conda activate impactenv

pip install -r requirements.txt
  1. Last but not least create a .env text file in your cloned repo. At a minimum, add the following environment variables:
OPENAI_API_KEY= "your OpenAI account API Key"
HF_TOKEN= "your HuggingFace account token"  <--- Optional: not required if using OpenAI
WEAVIATE_API_KEY= "your Weaviate cluster API Key"   <--- you will get this on Day One of the course
WEAVIATE_ENDPOINT= "your Weaviate cluster endpoint"  <--- you will get this on Day One of the course
  1. If you've made it this far, you are ready to start the course. Enjoy the process!

jocko

vectorsearch-applications's People

Contributors

americanthinker avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.