Giter Site home page Giter Site logo

farouqzaib / fast-search Goto Github PK

View Code? Open in Web Editor NEW
51.0 2.0 3.0 700 KB

Vector Database implemented in Golang with support for full-text and vector search as well as fault tolerance via Raft.

Go 100.00%
full-text-search hnsw vector-database embeddings-similarity nearest-neighbor-search search-engine

fast-search's Introduction

xdb: Distributed lite vector database built from *scratch.

What it does:

  • full-text search using proximity ranking
  • semantic search via HNSW + Cosine distance
  • integrated basic text embedding service (Python HTTP API around a sentence transformer)
  • Reciprocal Rank Fusion for merging full-text + semantic search results
  • in-memory serving + disk persistence
  • fault-tolerance with segment replication using Raft

What it's not:

  • production-ready (code is pretty sus right now)
  • perfect (because enemy of good)

Architecture

Getting started

Install the dependencies for the basic text embedding service in third_party folder using pip.

pip install -r requirements.txt

Then start the service like so:

uvicorn main:app

Set an environment variable EmbeddingHost which points to the address of the embedding service

export EmbeddingHost="http://127.0.0.1:8000/embeddings"

Proceed to start instance(s) of the vector db

flags
  • httpAddr: address of HTTP API service
  • joinAddr: HTTP API service address of primary node to join
  • nodeId: unique identifier for node
  • raftAddr: raft address for node
Run single-node
go run cmd/server/main.go -httpAddr 127.0.0.1:8111 -nodeId 0 -raftAddr 127.0.0.1:9000

API

POST /index

index a document

curl --location '127.0.0.1:8111/index' --header 'Content-Type: application/json' --data '{"text": "some text"}'
GET /search

do a search

curl --location --request GET '127.0.0.1:8111/search' \
--header 'Content-Type: application/json' \
--data '{"query": "some text"}'
Run 3-node cluster

Run the commands below on different machines (at least different instances of the project to simulate)

go run cmd/server/main.go -httpAddr 127.0.0.1:8111 -nodeId 0 -raftAddr 127.0.0.1:9000

Replicas join the primary on 127.0.0.1:9000

go run cmd/server/main.go -httpAddr 127.0.0.1:8112 -nodeId 1 -raftAddr 127.0.0.1:9001 -joinAddr 127.0.0.1:8111
go run cmd/server/main.go -httpAddr 127.0.0.1:8113 -nodeId 2 -raftAddr 127.0.0.1:9002 -joinAddr 127.0.0.1:8111

TODO

  • Indexing
    • Concurrent indexing using goroutines to process terms
  • Retrieval
    • Boolean queries
    • Concurrent memtable search
  • Ranking
  • API
    • Bulk index
    • Document deletion
  • Storage
    • Segment compaction
  • Replication
    • Snapshot working?
  • Deployment
    • Containerisation
  • Code quality
    • Penance for all the atrocities I committed.

*Would not have been possible without these resources:

fast-search's People

Contributors

farouqzaib avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.