It's about a gradient descent using Unigram model.
Python 100.00%
gradient-descent's Introduction
HW 2. Gradient Descent ๐
Information ๐ป
Subject: Introduction to Natural Language Processing
Professor: Patrick Wang
Assignment: HW #2 Gradient Descent
Name: Suim Park (sp699)
Notification โ๏ธ
OS Module
Whenever running the code, I consistently received a warning message about multiple copies of the OpenMP runtime. To avoid this and execute the code, I included 'OS' module within the code.
matplotlib
To make some line plots and bar plots, I imported 'matplotlib' library, especially as 'matplotlib.pyplot'.
importmatplotlib.pyplotasplt
Description
Code Description
loss as a function of time/iteration
I added an empty list to save loss values ('Loss_value') . I stored the loss values generated every time the 'train model' operation ran, and then plotted a graph of these values. This allowed me to observe how the minimum possible loss changes with the number of iterations and variations in the learning rate.
the (known) optimal probabilities
I determined the minimum possible loss using the following method: I utilized the values of each encoding to count the frequency of each alphabet and divided it by the total occurrence count. During this process, I created a list to store the occurrence count for each alphabet, initializing it to 1. I also converted the data type to a tensor to make it compatible with the code.
the final token probabilities
I calculated the final token probabilities using the Unigram model. I applied the vocabulary as input to the Unigram model using the variable 'x', and after observing the probability distribution within the Unigram class, I normalized it to find the probabilities. Since I need to see the final token probabilities, I specifically selected the last value with 'value(-1)'. Since I used Numpy for this operation, I added 'clone().detach().numpy()' to ensure compatibility with tensors.
the (known) minimum possible loss
I found the minimum possible loss by multiplying the optimal probabilities with the occurrence counts of each alphabet. In this context, since log probabilities always have negative values, I used the 'loss_fn' function to transform them into positive values. The minimal possible loss allows us to verify whether the loss function gradually decreases with the number of iterations and the ideal learning rate.
get reasonably good results quickly(seconds)
Through a series of experiments varying the iteration number and learning rate, I observed that when the iteration number was set to 1000 and the learning rate to 0.1, the loss rapidly decreased and converged to the minimum possible loss. Additionally, during this configuration, I verified that the probabilities of the last token closely matched the optimal probabilities.
Test case ๐
Case 1(Recommended)
Test
# set number of iterations and learning ratenum_iterations=1000# SET THISlearning_rate=0.1# SET THIS
Result Plot
Case 2
Test
# set number of iterations and learning ratenum_iterations=1000# SET THISlearning_rate=0.5# SET THIS
Result Plot
Case 3
Test
# set number of iterations and learning ratenum_iterations=1000# SET THISlearning_rate=0.01# SET THIS
Result Plot
Case 4
Test
# set number of iterations and learning ratenum_iterations=500# SET THISlearning_rate=0.5# SET THIS
Result Plot
Case 5
Test
# set number of iterations and learning ratenum_iterations=500# SET THISlearning_rate=0.1# SET THIS
Result Plot
Case 6
Test
# set number of iterations and learning ratenum_iterations=500# SET THISlearning_rate=0.01# SET THIS
Result Plot
Case 7
Test
# set number of iterations and learning ratenum_iterations=100# SET THISlearning_rate=0.5# SET THIS
Result Plot
Case 8
Test
# set number of iterations and learning ratenum_iterations=100# SET THISlearning_rate=0.1# SET THIS
Result Plot
Case 9(Recommended)
Test
# set number of iterations and learning ratenum_iterations=100# SET THISlearning_rate=0.01# SET THIS