We present a novel approach for efficient loss weighting in a tree-structured neural network. Current methods consider either only the top-node prediction for loss calculation, or weight all nodes equally yielding to a strongly imbalanced class loss. Our method progresses through the tree, starting at word level, to focus the loss on misclassified nodes. We propose three different heuristics for determining such misclassifications and investigate their effect and performance on the Stanford Sentiment Treebank based on a binary Tree-LSTM model. The results show a significant improvement compared to previous models concerning accuracy and overfitting. The figure below visualizes the concept.
This paper was written in the context of the second partical assignment (Sentiment Classification with Deep Learning) for the course "Natural Language Processing 1" at the University of Amsterdam. The full paper can be found here.
The code is structured into two jupyter notebooks.
This notebook summarizes all experiments and models from the mandatory assignment part (BOW, CBOW, Deep CBOW, LSTM, Tree-LSTM). In addition, we create all plots which are shown in the paper in this notebook. The results for other models are taken into account by the text files provided next to the notebooks.
This notebook contains all experiments from our proposed models. Please note that executing this notebook will take a longer time as the training is set to 50,000 iterations. We therefore advise to use the pretrained models (see below) for test purpose.
Pretrained models can be found here. Please download the checkpoint folder and save it in the folder notebook. For plotting, the needed test predictions/accuracies are provided in text file which are saved in the notebook folder as well. If the checkpoint folder is not downloaded, the benchmark models need to be trained and saved because the plot evaluation is based on loading these models and running over test set.
For questions regarding code, models or paper, please contact [email protected].