Giter Site home page Giter Site logo

omochi's Introduction

七輪の上で焼かれたお餅

Omochi 😊

Full text search engine from scratch by Golangʕ◔ϖ◔ʔ (Just a toy)

✨ Features

  • Omochi is an inverted index based search engine by Golang.
  • If indexed correctly, any document can be searched.
  • You can search documents from RESTful API.
  • Supported language: English, Japanese.
スクリーンショット 2022-07-08 11 08 15

📍 Requirements

📦 Setup

Create network

Create docker network(omochi_network) by:

$ docker network create omochi_network

Database migration

Omochi uses MariaDB for storing Inverted Indexes & Documents, and Ent for ORM.

For database migration, connect docker container shell by:

$ docker-compose run api bash

Then, running database migration by:

$ go run ./cmd/migrate/migrate.go 

Seed data

To try search engine, this project provides two datasets as samples in TSV Format.

The dataset for English is a Movie title dataset, and the dataset for Japanese is a Doraemon comic title dataset.

At first, connect docker container shell by:

$ docker-compose run api bash

Then, seed data by:

$ go run {path to seed.go}

If you initialize with a Japanese dataset, {path to seed.go} should be ./cmd/seeds/ja/seed.go . On the other hand, for English, ./cmd/seeds/eng/seed.go .

🏇 Start Application

After completing setup, you can start application by running:

$ docker-compose up

This app starts a RESTful API and listens on port 8081 for connections

🌎 How to use & Demo

After seeding data , you can search documents by send GET request to /v1/document/search .

Query parameters are as follow:

  • "keywords": Keywords to search. If there are multiple search terms, specify them separated by commas like "hoge,fuga,piyo"
  • "mode": Search mode. The search modes that can be specified are "And" and "Or"

Demo

  • Doraemon comic title dataset

After data seeding by Doraemon comic title dataset, you can search documents which include "ドラえもん" by:

$ curl "http://localhost:8081/v1/document/search?keywords=ドラえもん" | jq . 
{
  "documents": [
    {
      "id": 12054,
      "content": "ドラえもんの歌",
      "tokenized_content": [
        "ドラえもん",
        "歌"
      ],
      "created_at": "2022-07-08T12:59:49+09:00",
      "updated_at": "2022-07-08T12:59:49+09:00"
    },
    {
      "id": 11992,
      "content": "恋するドラえもん",
      "tokenized_content": [
        "恋する",
        "ドラえもん"
      ],
      "created_at": "2022-07-08T12:59:48+09:00",
      "updated_at": "2022-07-08T12:59:48+09:00"
    },
    {
      "id": 11230,
      "content": "ドラえもん登場!",
      "tokenized_content": [
        "ドラえもん",
        "登場"
      ],
      "created_at": "2022-07-08T12:59:44+09:00",
      "updated_at": "2022-07-08T12:59:44+09:00"
    },
    ... 
  • Movie title dataset

After data seeding by Movie title dataset, you can search documents which include "toy" and "story" by:

$ curl "http://localhost:8081/v1/document/search?keywords=toy,story&mode=And" | jq .
{
  "documents": [
    {
      "id": 1,
      "content": "Toy Story",
      "tokenized_content": [
        "toy",
        "story"
      ],
      "created_at": "2022-07-08T13:49:24+09:00",
      "updated_at": "2022-07-08T13:49:24+09:00"
    },
    {
      "id": 39,
      "content": "Toy Story of Terror!",
      "tokenized_content": [
        "toy",
        "story",
        "terror"
      ],
      "created_at": "2022-07-08T13:49:34+09:00",
      "updated_at": "2022-07-08T13:49:34+09:00"
    },
    {
      "id": 83,
      "content": "Toy Story That Time Forgot",
      "tokenized_content": [
        "toy",
        "story",
        "time",
        "forgot"
      ],
      "created_at": "2022-07-08T13:49:53+09:00",
      "updated_at": "2022-07-08T13:49:53+09:00"
    },
    {
      "id": 213,
      "content": "Toy Story 2",
      "tokenized_content": [
        "toy",
        "story"
      ],
      "created_at": "2022-07-08T13:50:35+09:00",
      "updated_at": "2022-07-08T13:50:35+09:00"
    },
    {
      "id": 352,
      "content": "Toy Story 3",
      "tokenized_content": [
        "toy",
        "story"
      ],
      "created_at": "2022-07-08T13:51:23+09:00",
      "updated_at": "2022-07-08T13:51:23+09:00"
    }
  ]
}

📚 Reference

Dataset

Book

🧑‍💻 License

MIT

omochi's People

Contributors

yadayuki avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.