A small python script to read through a folder of captions and determine the most common words and most common phrases (combination of words)
Simply edit start.py
and adjust any blocked words you want to exclude, by default we are excluding on in a with of her him them one his she while and no for
, you can add more or remove any.
Edit line 41 to your path, in it's current example it's expecting your text files to be in the "files" folder beside start.py
Everytime you run, a new log file is created with the results.
pip install requirements.txt
to install libraries
Launch it by using python start.py