Giter Site home page Giter Site logo

theablemo / information-retrieval-final-project-qoogle Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 7.77 MB

This code is for the final project of the Information Retrieval course taught by Dr.Ehsaneddin Asgari on the Spring semester of 2022

Python 6.21% HTML 93.77% Dockerfile 0.03%

information-retrieval-final-project-qoogle's Introduction

Introduction

Project Quran MIR was carried out as the advanced information retrieval project in the spring of 2023. The aim of this project was to create a search engine for Quranic verses. Additionally, with the help of statistical and algorithmic methods, other tools such as identifying central verses, clustering verses conceptually into two categories (which were classified as Meccan/Medinan with 90% accuracy), and more, have been developed.

Since the development and evaluation of models and storing their results require a completely different environment compared to the development of the website and displaying outputs, this project has been stored in 2 repositories.

  • The first repository at https://github.com/Jarrahi-MM/quran_mir contains the scientific section of the project. All codes, results, and evaluations of different models are included in this repository.
  • The second repository at https://github.com/IR1401-Spring-Final-Projects/Quran1401-1_20 contains the website section of the project. Some codes have been directly placed from the other repository in this repository, and for other codes, only the outputs of the models have been provided. For some codes, the results of the analysis of verses and chapters have been placed in the form of Excel files in this repository and are only used from there.

In each repository, the repository's structure is explained in the README.md files.

Collaborators

Project Structure Description

The user interface of this project has been built using the Django framework. With this framework, we were able to create a search interface similar to Google, named Qoogle or Quran Google, which allows you to search for your desired phrases throughout the Quran.

This project uses an SQLite database and has been dockerized with Docker Compose technology.

Usage

Simply type the desired phrase in the search bar, then select your preferred search engine from the following options available in the dropdown next to the search bar:

  • Boolean
  • TF-IDF
  • Fasttext
  • Transformer
  • Elastic Search

Then, by clicking the search button, you can see the desired results. These results include verses that are displayed to you in order using the page rank algorithm. Additionally, the address of the verses, namely the name of the chapter and verse, and whether they are Meccan or Medinan, are written above the verse, which you can click to open the verse or chapter. Finally, at the bottom of each result, you can view the classifications indicating the guessed chapters by the system, whether they are Meccan or Medinan, or the 4-category cluster we have created, and compare them with their actual values.

Finally, to view the central verse of each chapter, you can use the "I'm Feeling Lucky" button on the main page.

Setup

Start docker container

screen -S mir_site / -r mir_site
sudo docker-compose up --build
ctrl-A-D

Go to docker container

sudo docker exec -it tmwm /bin/bash

Manual Start

python3 -m venv ./venv
source ./venv/bin/activate
python -m pip install --upgrade pip
sudo apt-get install python3-dev
pip install -r requirements.txt

screen -S mir_site / -r mir_site
source ./venv/bin/activate
sudo env "PATH=$PATH" python manage.py runserver 0:81
ctrl-A-D

screen -S mir_commands / -r mir_commands
python manage.py shell

from information_retrieval.lib.quran_mir.quran_ir import ArabertQuranIR
ArabertQuranIR()
ctrl-A-D

Download Fasttext lib

/information_retrieval/lib/quran_mir# git clone https://github.com/facebookresearch/fastText.git
/information_retrieval/lib/quran_mir/fastText# make
/information_retrieval/lib/quran_mir/fastText# pip install .
/information_retrieval/lib/quran_mir# mkdir fasttext_model

Train Fasttext model

python manage.py train_fasttext

Elasticsearch

Download ElasticSearch

curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
sudo apt update
sudo apt install elasticsearch

Configuring security options

sudo nano /etc/elasticsearch/elasticsearch.yml

unconmment network.host and network.port in the elasticsearch.yml file and change it like below.

network.host: localhost
network.port: 9200

Save and exit. Then run the follwing in the terminal.

sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch

Test elasticsearch

curl -X GET 'http://localhost:9200'

You should get something like the following.

Output
{
  "name" : "elasticsearch-ubuntu20-04",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "qqhFHPigQ9e2lk-a7AvLNQ",
  "version" : {
    "number" : "7.6.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
    "build_date" : "2020-03-26T06:34:37.794943Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

information-retrieval-final-project-qoogle's People

Contributors

aryanahadinia avatar jarrahi-mm avatar maghasemzadeh avatar theablemo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.