Giter Site home page Giter Site logo

kpra's Introduction

Kinopoisk Review Analysis

KPRA is an experimental service to reduce tons of letters in film reviews to quicklier grasp the films' essence and answer the two questions:

  • Why should I watch the film?
  • Why should not I watch the film?

Development Process Outline

  • Find the location (in HTML) of the numbers that indicate how many positive/negative/neutral reviews there are, and extract them.
  • Depending of the above numbers, calculate the maximum number of reviews per page to display (in order to minimize the number of pages to download and parse).
  • Write a script to download all the reviews (and, perhaps, store them temporary locally).
  • Bring all words to the initial form in order to compute tf–idf. For example, via Yandex'es mystem.
  • Compute the tf–idf statistic on the obtained data. Presumably the better way is to treat the primary data as follows: Each review is a document, each collection of reviews according to some mood is a collection, or corpus. However note that since it's important to know the word weight for a certain mood, there's probably good logic in treating the whole set of reviews (independent from the mood) as a single corpus as well.
  • Define collocations by using t-test, chi-square, MLE, MI/PMI, etc. As well as above, maybe it's needed to work on the initial word forms. After obtaining various metrics, opt for the most appropriate (basing on some factors?) collocations.
  • Develop a simple GUI for a web service.

Plans For the Future

  • KPRA should function as a separate web service that enables users to promptly check the info about the film.
  • KPRA should somehow retain the already collected information for the quicker processing of further requests.

kpra's People

Contributors

denpatin avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.