A search engine over a list of movies that have a dedicated page on Wikipedia.
-
README.md
: the Markdown file that explains the content of your repository. -
collector.py
: a python file that contains the line of code needed to collect data from thehtml
page and Wikipedia. -
collector_utils.py
: a python file that stores the functions used incollector.py
. -
parser.py
: a python file that contains the line of code needed to parse the entire collection ofhtml
pages and save those intsv
files. -
parser_utils.py
: a python file that gathers the functions used inparser.py
. -
index.py
: a python file that once executed generate the indexes of the Search engines. -
index_utils.py
: a python file that contains the functions used for creating indexes. -
utils.py
: a python file that gather functions you need in more than one of the previous files like (collector
,parser
, etc.) -
main.py
: a python file that once executed build up the search engine. -
exercise_4.py
: python file that contains the implementation of the algorithm that solves problem 4. -
main.ipynb
: a notebook file where search functions should be tested
document_norm.json
,inverted_index_2.json
,inverted_index.json
,name_index.json
,name_norm.json
,vocabulary.json