Giter Site home page Giter Site logo

harry_potter_nlp's Introduction

NLP on the Books of Harry Potter

This repo demonstrates a collection of NLP tasks all using the books of Harry Potter for source documents. Individual tasks can be read about here:

  1. Topic modeling with Latent Dirichlet Allocation
  2. Regular Expression case study
  3. Extractive text summarization
  4. Sentiment analysis

Emotional Sentiment of the Harry Potter series

Instructions for BasicNLP class (basic_nlp.py)

Functions of the class are topic modeling with LDA, document summarization, and sentiment analysis.

  1. Initialize the class with a list of documents and an optional list of document titles, for example:
texts = ['this is the first document', 'this is the second document', 'this is the third document']
titles = ['doc1', 'doc2', 'doc3']

nlp = BasicNLP(texts, titles)
  1. LDA:

    1. Create an elbow plot and print the coherence scores by specifying the number of topics to include, with:
      nlp.compute_coherence(start=5, stop=20, step=3)
      
    2. Set the number of topics to use in the model with:
      nlp.set_number_of_topics(10)
      
    3. View the clusters (only available in Jupyter notebook):
      import pyLDAvis
      pyLDAvis.enable_notebook()
      vis = nlp.view_clusters()
      pyLDAvis.display(vis)
      
    4. Get the vocabulary for each topic in the LDA model with (topics can be 'all', a list of integers, or a single integer):
      nlp.get_topic_vocabulary(topics='all', num_words=10)
      
    5. Get the documents most highly associated with the given topics with:
      nlp.get_representative_documents(topics='all', num_docs=1)
      
    6. Get the sentence summaries of the documents most highly associated with the given topics with:
      nlp.get_representative_sentences(topics='all', num_sentences=3)
      
    7. Provide a name for an LDA topic (if preferred over the numbering system) with:
      nlp.name_topic(topic_number=1, topic_name='My topic')
      
  2. Document summarization:

    Get the sentence summaries of the requested documents with:

    nlp.get_document_summaries(documents='all', num_sent=5)
    
  3. Sentiment analysis:

    Get the sentiment scores (compound, positive, neutral, negative) for the requested documents with:

    nlp.get_sentiment(documents='all')
    

harry_potter_nlp's People

Contributors

raffg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

harry_potter_nlp's Issues

Data not found!

Hi!

We are attempted to further an independent study by exploring your TDS article studying Harry Potter using NLP! Could you point us towards where you downloaded your data? The version we're using isn't properly formatted for exploring your code.

If you'd prefer, you can send it to: [email protected]

Thank you in advance!

regarding data used

sir i was searching for your email id everywhere but didn't found. i wanted the book data you have used for deep leaning on text summarization and semantic analysis. if its possible for you can you mail me the .txt format of the books you have used so i can go through the. my emial id is [email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.