This project deals with the analysis of characters of the book, predicting their attributes and relationships amongst them. During the adaptation of a novel/book in any form, there goes a lot of human effort, which proves to be cumbersome and unwanted. Also, the human brain tends to ignore several minor details about the happenings/characters in the book. The above forth mentioned scenario can often lead to inaccuracy in the plot of the adaptation. The project is therefore an innovation that aims at aiding an easy and accurate adaptation of a book, thus making the process a whole lot simpler and precise. The model aims at scanning the humongous amounts of texts present in the book. Post scanning, the model will show interesting insights which are derived from the given book by applying a variety of analytical techniques based on a combination of natural language processing, sentiment/emotion analysis, and social network analysis method
Regarding the layout of the repository, the project has been organized into three folders:-
-
Human Names Generator - This folder consists of notebooks which are responsible for generating a master CSV file of all possible human names.
- There are three CSV files, Indian-Female-Names.csv, Indian-Male-Names.csv and Foreign-Names.csv which have been used.
- The iPython Notebook Names List Generator.ipynb is responsible for generating the list of human names which are stored into a pickle file.
- The pickle file humanNames.txt is the output of this particular stint and is used in the further course of the project.
-
Any Book - This folder performs analysis on an English novel, named Sense and Sensisbility.
- The file textFile.txt is the UTF-8 encoded version of the novel.
- The iPython Notebook Word CSV Generator.ipynb generates a CSV file words.csv which is used further in the course to perform analysis.
- The iPython Notebook Character List Generator.ipynb generates the list of main characters in the book and saves it in characterList.txt
- The iPython Notebook Word Cloud Generator.ipynb generates word cloud for any specified character The folder Word Clouds consists of results of Word Cloud Genrator.ipynb run on various characters.
- The iPython Notebook Sentiment Analysis.ipynb performs sentiment analysis, using NRC_emotion_lexicon_list.txt and generates visualisations to facilitate proper depiction of sentiment throughout the book.
-
Mahabharata - This folder performs analysis on the epic Mahabharata.
- The folder data consists of the raw text data. It consists of all 18 books, which have been combined into mahafull.txt
- The iPython Notebook Word CSV Generator.ipynb generates a CSV file words.csv which is used further in the course to perform analysis.
- The iPython Notebook WordCloud For Any Character.ipynb generates word cloud for any specified character The folder Word Clouds consists of results of Word Cloud Genrator.ipynb run on various characters.
- The iPython Notebook Relation Generator For Any Character.ipynb generates the network for the specified characters, i.e. depicts the characters in the book with whom that particular character had encounters.
- The iPython Notebook Mahabharat Sentiment Analysis.ipynb performs sentiment analysis, using NRC_emotion_lexicon_list.txt and generates visualisations to facilitate proper depiction of sentiment throughout the book, as well as chap_with_emo_scores_normalized.csv which shows the degree of every emotion in every book.
- The iPython Notebook LDA_Mahabharata.ipynb performs LDA Analysis for topic modelling on Mahabharata. And generates an interactive output
- Python - NLTK, spaCy, Seaborn, Matplotlib
- Algorithms - VADER, LDA
- Development Platform - Jupyter