This is a repo for Bag of Words Meets Bags of Popcorn on Kaggle
- Data preprocessing
- Generate word2vec data
- Train a CNN model for text classification
- Predict testing data
- Observe data format
- Statistics
- Data cleaning
- word2vec
- Remove the " character at the start/end of review
- Replace \" to "
- Replace multiple <br /> to space character
- Tokenize the whole review (article)
- Lemmatize all words
- Remove the " character at the start/end of review
- Replace \" to "
- Replace multiple <br /> to space character
- Split sentences from review (article)
- Tokenize all sentences
- Lemmatize all words
- Preprocessing for CNN training data
- Training model
- Test model by validation set
- Find out a good model
- Generate predict by the good model
- Split data into training set and validation set
- Index all words in word2vec
- Encode all the words of reviews in training data and validation data