Giter Site home page Giter Site logo

search-engine's Introduction

search-engine

Installation

Assumptions prior to install

  • Python 3 is installed and accessible from the path

Setting up a venv and activating it

Linux instructions

Run the following commands

  • Note that if you are using the fish shell to use "activate.fish" instead of "activate"
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Windows instructions

Run the following commands

py -3 -m venv venv
venv\Scripts\activate
pip3 install -r requirements.txt

If you encounter the following error or something similar:

venv\Scripts\activate : File C:\Users\Raymo\Desktop\blah\search-engine\venv\Scripts\Activate.ps1 cannot be
loaded because running scripts is disabled on this system. For more information, see about_Execution_Policies
at https:/go.microsoft.com/fwlink/?LinkID=135170.
At line:1 char:1
+ venv\Scripts\activate
+ ~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : SecurityError: (:) [], PSSecurityException
    + FullyQualifiedErrorId : UnauthorizedAccess
  1. Open PowerShell in Administrator mode
  2. Input this in the prompt
set-executionpolicy remotesigned
  1. Say yes

Reference: https://superuser.com/questions/106360/how-to-enable-execution-of-powershell-scripts

Deactivating a venv

Run the following command

deactivate

Viewing installed packages

  • You can see your installed requirements from pip using the command
pip freeze

Running the application

Reference - https://flask.palletsprojects.com/en/1.1.x/quickstart/

  • Be in the root directory of the repository when you run these commands
  • Note the setting of the FLASK_ENV variable enables debug mode
    • The indexes and corpus will NOT be rebuilt when development mode is set

Linux instructions

Run the following commands

export FLASK_APP=searchapp/searchapp.py
export FLASK_ENV=development
flask run

Windows instructions

Powershell Instructions

Run the following commands

$env:FLASK_APP = "searchapp/searchapp.py"
$env:FLASK_ENV = "development"
flask run
CMD Instructions

Run the following commands

set FLASK_APP=searchapp/searchapp.py
set FLASK_ENV=development
flask run

Project Structure

Based on:

search-engine's People

Contributors

rkchang avatar tpaul016 avatar

Forkers

rkchang

search-engine's Issues

DB Access

  • module to access NoSQL DB
  • user sessions

Add logging

We currently have no logging at all and instead depend on console output. It would be nice to have logs that display important events and errors. This could be important when we host this.

Wildcard expansion not working for query: (*ge AND_NOT (man* OR health*))

Got this error on the version-3-release branch

Steps to reproduce:

  1. Use: (*ge AND_NOT (man* OR health*)) as query
  2. Set to boolean
  3. Set to courses
  4. Disable global query expansion

This is the v1.1.1 release response:

Creating app ...
Building Inverted Index ...
Finished building Inverted Index
Building Bigram Index ...
Finished building Bigram Index
Done creating app
wildcard expanded to:
['(((((age', 'OR', 'refuge)', 'OR', 'wage)', 'OR', 'gene)', 'OR', 'page)', 'OR', 'stage)']
wildcard expanded to:
['(((((((((((manag', 'OR', 'managementcours)', 'OR', 'mandat)', 'OR', 'manual)', 'OR', 'managerscours)', 'OR', 'mammalian)', 'OR', 'manipul)', 'OR', 'manifold)', 'OR', 'manifest)', 'OR', 'manageri)', 'OR', 'manufactur)', 'OR', 'maintenancecours)']
wildcard expanded to:
['(((healthcours', 'OR', 'healthsafeti)', 'OR', 'health)', 'OR', 'healthcar)']
--------------------------------
Boolean
--------------------------------

Current response

wildcard expanded to:
['(((((wage', 'OR', 'age)', 'OR', 'refuge)', 'OR', 'stage)', 'OR', 'page)', 'OR', 'gene)']
wildcard expanded to:
['(((((((((((mandat', 'OR', 'manag)', 'OR', 'managerscours)', 'OR', 'mammalian)', 'OR', 'manifest)', 'OR', 'manipul)', 'OR', 'manifold)', 'OR', 'manual)', 'OR', 'manageri)', 'OR', 'managementcours)', 'OR', 'manufactur)', 'OR', 'maintenancecours)']
wildcard expanded to:
['(((healthcours', 'OR', 'healthsafeti)', 'OR', 'health)', 'OR', 'healthcar)']
127.0.0.1 - - [02/May/2020 23:04:57] "POST /docs HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/app.py", line 2463, in __call__
    return self.wsgi_app(environ, start_response)
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/app.py", line 2449, in wsgi_app
    response = self.handle_exception(e)
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/app.py", line 1866, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/rchang/dev/CSI4107/search-engine/venv/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/rchang/dev/CSI4107/search-engine/searchapp/searchapp.py", line 112, in handleQuery
    formatted_query = query_pre_processing.get_query_documents(query, corpus)
  File "/home/rchang/dev/CSI4107/search-engine/searchapp/boolean_retrieval_model/query_pre_processing.py", line 107, in get_query_documents
    for doc in index[token]['docs']:
KeyError: 'man*'

Build Web crawler

  • This issue will probably be broken up into several
  • Should probably email the Professor for advice

Rework relevance feedback module

  • Our current implementation assumes one session is one instance of the searchapp running. It doesn't keep track of different users either.

depends on #74

Clean up README

  • Don't need instructions for Grader, merge the developer and grader instructions

global query expansion does not handle AND_NOT for boolean retrieval model

query (reuters corpus):
(((shareholder AND security) AND_NOT payment) AND_NOT year)

global expansion console message:
Global Expansion: Input: (((shareholder AND security) AND_NOT payment) AND_NOT year)
Global Expansion: word: shareholder syns: ['stockholder', 'shareowner']
Global Expansion: word: security syns: ['protection', 'certificate', 'surety']
Global Expansion: word: _NOT payment is not in all_lemmas, dropping!
Global Expansion: word: _NOT year is not in all_lemmas, dropping!
Global Expansion: Expanded Query (((((shareholder OR stockholder) OR shareowner) AND (((security OR protection) OR certificate) OR surety)) AND_NOT payment) AND_NOT year)

Flake8 scan detected undefined name in pre_processing.py

/home/rchang/dev/CSI4107/search-engine | addtest
> flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude venv/
./searchapp/cor_pre_proc/pre_processing.py:59:30: F821 undefined name 'title'
                title_text = title.text
                             ^
./searchapp/cor_pre_proc/pre_processing.py:63:29: F821 undefined name 'body'
                body_text = body.text
                            ^
2     F821 undefined name 'title'
2

Testing for Saturday

  • Verify Instructions for the grader instructions README work on a Windows 10 computer
  • Verify corpus, index.json and biIndex.json is generated
  • Verify corpus, index.json and biIndex.json are rebuilt when parameters are changed
  • Verify Wildcard Boolean Search works with all settings on (ex: stopword removal) or toggled with given test queries
  • Verify VSM Search works with all settings on (ex: stopword removal) or toggled given test queries

Add model enum

Currently we're using strings to determine which model we are interested in. Should we create a new enum and use that instead?

Fix styling issues.

The naming scheme of many variables and functions names are inconsistent, ex: some are camel case some are snake case

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.