Implementation of HAL (Hyperspace Analogue to Language) algorithm with linear weighting and usage idf multiplicator in python 3.6. Czech news feeds are used as input data.
Python 100.00%
nlp-hal's Introduction
You will need install unidecode for running the scripts. You can download and install it
with command 'pip install unidecode'.
Run script: python /src/script.py
Content of repo:
|----- data - folder contains data for building hal model in Czech language
|----- train.txt - news feeds in Czech
|----- stopwords.txt - my stopwords which I use in script (stopwords are grouped
from several sources)
|----- src - python scripts
|----- script.py
|----- czech_stemmer.py - czech stemmer developed by Luís Gomes which I am using
|----- README