This is a collaborative project on NLP. Our goal is to construct metric to measure the distances among movies based on web-scraped movie reviews and descriptions, and recommend movie topics that attract people most to movie producers.
We have finisehd so far:
- Got a list of 3000 movies on Netflix.
- Scraped reviews of these movies on IMDB and descriptions on Wikipedia pages.
- Cleaned data, including removing stop words and finishing stemming.
- Caculated significant n-grams.
Our plan for future:
- Create distance metric of movies.
- topic modeling.
- ...