HW 2. Gradient Descent 📝

Information 💻

Subject: Introduction to Natural Language Processing
Professor: Patrick Wang
Assignment: HW #2 Gradient Descent
Name: Suim Park (sp699)

Notification ✔️

OS Module
Whenever running the code, I consistently received a warning message about multiple copies of the OpenMP runtime. To avoid this and execute the code, I included 'OS' module within the code.
```
import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "True"
```
matplotlib
To make some line plots and bar plots, I imported 'matplotlib' library, especially as 'matplotlib.pyplot'.
```
import matplotlib.pyplot as plt
```

Description

Code Description
1. loss as a function of time/iteration
- I added an empty list to save loss values ('Loss_value') . I stored the loss values generated every time the 'train model' operation ran, and then plotted a graph of these values. This allowed me to observe how the minimum possible loss changes with the number of iterations and variations in the learning rate.
1. the (known) optimal probabilities
- I determined the minimum possible loss using the following method: I utilized the values of each encoding to count the frequency of each alphabet and divided it by the total occurrence count. During this process, I created a list to store the occurrence count for each alphabet, initializing it to 1. I also converted the data type to a tensor to make it compatible with the code.
1. the final token probabilities
- I calculated the final token probabilities using the Unigram model. I applied the vocabulary as input to the Unigram model using the variable 'x', and after observing the probability distribution within the Unigram class, I normalized it to find the probabilities. Since I need to see the final token probabilities, I specifically selected the last value with 'value(-1)'. Since I used Numpy for this operation, I added 'clone().detach().numpy()' to ensure compatibility with tensors.
1. the (known) minimum possible loss
- I found the minimum possible loss by multiplying the optimal probabilities with the occurrence counts of each alphabet. In this context, since log probabilities always have negative values, I used the 'loss_fn' function to transform them into positive values. The minimal possible loss allows us to verify whether the loss function gradually decreases with the number of iterations and the ideal learning rate.
1. get reasonably good results quickly(seconds)
- Through a series of experiments varying the iteration number and learning rate, I observed that when the iteration number was set to 1000 and the learning rate to 0.1, the loss rapidly decreased and converged to the minimum possible loss. Additionally, during this configuration, I verified that the probabilities of the last token closely matched the optimal probabilities.

Test case 📌

Case 1 (Recommended)

Test

# set number of iterations and learning rate
num_iterations = 1000  # SET THIS
learning_rate = 0.1  # SET THIS

Result Plot

Case 2

Test

# set number of iterations and learning rate
num_iterations = 1000  # SET THIS
learning_rate = 0.5  # SET THIS

Result Plot

Case 3

Test

# set number of iterations and learning rate
num_iterations = 1000  # SET THIS
learning_rate = 0.01  # SET THIS

Result Plot

Case 4

Test

# set number of iterations and learning rate
num_iterations = 500  # SET THIS
learning_rate = 0.5  # SET THIS

Result Plot

Case 5

Test

# set number of iterations and learning rate
num_iterations = 500  # SET THIS
learning_rate = 0.1  # SET THIS

Result Plot

Case 6

Test

# set number of iterations and learning rate
num_iterations = 500  # SET THIS
learning_rate = 0.01  # SET THIS

Result Plot

Case 7

Test

# set number of iterations and learning rate
num_iterations = 100  # SET THIS
learning_rate = 0.5  # SET THIS

Result Plot

Case 8

Test

# set number of iterations and learning rate
num_iterations = 100  # SET THIS
learning_rate = 0.1  # SET THIS

Result Plot

Case 9 (Recommended)

Test

# set number of iterations and learning rate
num_iterations = 100  # SET THIS
learning_rate = 0.01  # SET THIS

Result Plot

suim-park / gradient-descent Goto Github PK

gradient-descent's Introduction

HW 2. Gradient Descent 📝

Information 💻

Notification ✔️

Description

Test case 📌

gradient-descent's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent