Giter Site home page Giter Site logo

jakob-l-m / word_frequency_visualization Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 17.78 MB

Visualization of word usage changes over time. Example usage on german covid tweets.

Home Page: https://www.jakob-l-m.de/word-frequency-vis.html

JavaScript 32.72% CSS 9.47% Jupyter Notebook 23.60% Python 27.46% HTML 6.74%
d3 javascript nlp python temporal tfidf visualization live-demo

word_frequency_visualization's Introduction

Word Frequency Visualization

This is the project for my bachelor thesis.

The finished Project can be viewed here. The website was build and tested using Google Chrome.For my example visualization I am using Twitter Data of German news agencies. All Tweets will be filtered by corona keywords.

In general the whole project can be used with your own dataset. Alternatively Twitter data can be used with other key- and stopwords. Even though my example is using german Tweets, the code supports 20+ Languages.

Screenshots of the finished website

The main graph:

Date Range slider:

Detail view of a specific word and date:

Gif showing hover animations:

Quick explanation of the process

In general all Textdata is tagged with a timestamp. Im using TF-IDF the calculate word relevance by day. We then smooth relevance by weeks. The final visualization shows radial stacked timelines for the different words. The User is able to select a date range and interact with the graph by clicking on words or dates.

word_frequency_visualization's People

Contributors

jakob-l-m avatar

Stargazers

 avatar  avatar

Watchers

 avatar

word_frequency_visualization's Issues

Animations

Animations for lines, dates, circles etc

  • Lines
  • Dates
  • Circles
  • Frequency Graph
  • word clouds

Plots/Graphs

  • Tweet frequency with label for ÖR and Private
  • Tweet frequency for each profile -> data validation
  • Function flowchart
  • script interactions
  • Tweets sorted by weekdays
  • each profile grouped by month. subplots (4,3)

Search Option

  • Textarea to type into
  • Find an efficient algorithm
  • Suggest words that are close
  • Make it pretty

Clean Up

Write comments

  • front end
  • back end

Remove unused code

  • front end
  • back end

Other tasks:

  • transfer notebooks into .py files
  • consistent fonts for website
  • add about section

View for a word

  • Lable x-Axis correctly
  • Calculate Word size by word length
  • Create Graph
  • unpack weights

Fix bugs

  • Error: attribute d: Expected arc flag ('0' or '1') - No apparent problem but needs to be fixed
  • coloring after date range change
  • dotted lines should not be over the arcs
  • prevent empty date range selection
  • make stopwords not all lowercase
  • Files with Ä,Ö,Ü,ß cannot be opened. replace with AE,OE,UE,SS

Preprocessing

  • implement tf-idf
  • use a keyword file
  • update tweet data
  • use a timestamp file for the data base
  • Filter out Tweet replies
  • create an exporter to output json files
  • extract uni, bi and tir grams
  • Use TreeTagger

Github Doku

Create dock for GitHub

  • Main Readme with pics and explanation
  • Readme in folders, explaining file structure

View for a date

  • Create basic view with a list of the words associated with that day
  • Generate Word Clouds
  • use actual weights
  • Make it look nice
  • Animations between Clouds

Ausarbeitung

Einleitung (2 Seiten - 2)

  • Motivation
  • Ziele
  • Aufbau

Grundlagen (9 Seiten - 11)

  • Scraping
  • Daten Pipeline
  • Input von anderen Datensätzen
  • Analyse
  • Exportieren der Daten zur Visualisierung

Visualisierung (7 Seiten - 18)

  • Erstellen der Grafiken
  • Animationen
  • Click und Hover Events
  • Farbgebung

Ergebnisse und Evaluation (6 Seiten - 24)

  • Veröffentlichung
  • Probleme der automatisierten Auswahl
  • Probleme an Tweets/Online Sprache

Zusammenfassung und Ausblick (4 Seiten - 28)

  • Möglichkeiten der Erweiterung
  • Abschließende Worte
  • Echtzeit

Anhang

  • Verwendete Bibliotheken mit Anhang

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.