Giter Site home page Giter Site logo

wordcounterapp's Introduction

Word Cloud Project

Assignment 3-Frequency File generation

Assignment 3 Objective- Word Frequencies x

Analyze a song for the frequency of word used by song utlizing different data structures.

1. Prepare a text file that contains text to analyze. In this case, I have chosen to prepare the songs from the 2019 album thank u, next by Ariana Grande.

Each song is located in the thank u, next folder within the src folder. Each individual song has a text file with the lyrics gathered from Genius. The distinction of the identifiers of Verse 1, Outro, Pre-Chorus, etc removed. This could also be done with adding them to the stopwords, adding the specific words. This would be a good addition for next implementaion.

2. With your code, you’ll read from the text file and capture the data into a data structure.

The constructor of the Song object accepts a String inputFile and String songTitle. The inputFile string is the location of the song File with the lyrics, which is then instaites a File object, then instantiating Scanner object. The Scanner object reads in each individual word. Every non-alphanumeric symbol was taken out and changed to lowercase

while(input.hasNext()){
  String next = input.next().toLowerCase();
  next = next.replaceAll("\\p{Punct}", "");

3. Using a data structure, write the code to count the appearance of each unique word in the lyrics.

The Song words were inputted into a HashMap<String, Integer>. The key is a String or the word, and the value was the number of times it occured. This was used for easy access to each word to increment or add to. With a O(1) there does not need to be any searching or loops, making it the best data structure.

It was also checked for stopwords. The stopwords were put into a HashSet, read into through the stopwords.txt file, which contained the Default English Stopwords List from [ranks.nl/stopwords]. The stopwords used a HashSet only needing keys for the stopwords and allows for easy access to compare to the inputting song word. For every Song a new stopword HashSet was created, which may be unnecessary and instead could pass the stopword HashSet into the contructor of the Song Object.

if (stopwords.contains(next)) {
} else if (!songLyrics.containsKey(next)) {
    songLyrics.put(next, 1);
    totalWords++;
} else {
    songLyrics.put(next, songLyrics.get(next) + 1);
    totalWords++;
}     

4. Print out a word frequency list.

Printing the Frequency List is done through the method in the Song Class. Accepting the HashMap<String, Integer> of the song. It wrote to the frequencyFile.txt. Using a for loop it looped through every key in the set, printing the key with its corresponding value pair.

public void printFrequencyFile(HashMap<String, Integer> song){
 try (PrintWriter output = new PrintWriter("frequencyFile.txt")) {
   for (String word : song.keySet()) {
     int count = song.get(word);
     output.println(count + ": " + word);
   }
 } catch (IOException e){
           e.printStackTrace();
   }
 }    
}   

i.e Output File, thank u, next

42: next
38: you
37: thank
15: yeah
8: my
7: say
6: taught

Song Class

The Song Class holds data for a song, including fields HashMap<String, Integer> songLyrics, String songTitle and HashSet stopwords. The methods include the constructor, Song(String inputFile, String T) to input the song, void printLyrics(HashMap<String, Integer> song) to print the lyrics to the console, and void printLyricsFile(HashMap<String, Integer> song to print the lyrics to the output file.

Album Class

The Album class holds the data for the entire album, including fields int NumOfSongs, NumOfWords, andh HashMap<String, Integer> albumLyrics. The methods include the constructor, Album(Song inputSong, int songIndex) to instantiate the album, addSong(Song inputSong, int songIndex) to input the song objects lyrics into the albumLyrics HashMap<String, Integer>, void printLyricsFile(HashMap<String, Integer> album) to print the lyrics to the output file.

The Song and Album class fields are denoted as public, this could lead to data corruption and should be assigned private and accessed with getter and setter methods OR given package access, allowing the Album class to utilize the songs fields. They are denoted public so the album class can access the Song classes fields.

i.e Album class inputting a Song classes songLyrics HashMap<String, Integer>

inputSong.songLyrics.keySet().forEach((String word) -> {
            if(albumLyrics.containsKey(word)){
                albumLyrics.put(word,
                        inputSong.songLyrics.get(word)+ albumLyrics.get(word));
            }else{
                albumLyrics.put(word, inputSong.songLyrics.get(word));
            }
        });

wordcounterapp's People

Contributors

alexfuoc avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.