Giter Site home page Giter Site logo

gradient-descent's Introduction

HW 2. Gradient Descent ๐Ÿ“

Information ๐Ÿ’ป

  • Subject: Introduction to Natural Language Processing
  • Professor: Patrick Wang
  • Assignment: HW #2 Gradient Descent
  • Name: Suim Park (sp699)

Notification โœ”๏ธ

  • OS Module
    Whenever running the code, I consistently received a warning message about multiple copies of the OpenMP runtime. To avoid this and execute the code, I included 'OS' module within the code.

    import os
    os.environ["KMP_DUPLICATE_LIB_OK"] = "True"
  • matplotlib
    To make some line plots and bar plots, I imported 'matplotlib' library, especially as 'matplotlib.pyplot'.

    import matplotlib.pyplot as plt

Description

  • Code Description

    1. loss as a function of time/iteration
    • I added an empty list to save loss values ('Loss_value') . I stored the loss values generated every time the 'train model' operation ran, and then plotted a graph of these values. This allowed me to observe how the minimum possible loss changes with the number of iterations and variations in the learning rate.
    1. the (known) optimal probabilities
    • I determined the minimum possible loss using the following method: I utilized the values of each encoding to count the frequency of each alphabet and divided it by the total occurrence count. During this process, I created a list to store the occurrence count for each alphabet, initializing it to 1. I also converted the data type to a tensor to make it compatible with the code.
    1. the final token probabilities
    • I calculated the final token probabilities using the Unigram model. I applied the vocabulary as input to the Unigram model using the variable 'x', and after observing the probability distribution within the Unigram class, I normalized it to find the probabilities. Since I need to see the final token probabilities, I specifically selected the last value with 'value(-1)'. Since I used Numpy for this operation, I added 'clone().detach().numpy()' to ensure compatibility with tensors.
    1. the (known) minimum possible loss
    • I found the minimum possible loss by multiplying the optimal probabilities with the occurrence counts of each alphabet. In this context, since log probabilities always have negative values, I used the 'loss_fn' function to transform them into positive values. The minimal possible loss allows us to verify whether the loss function gradually decreases with the number of iterations and the ideal learning rate.
    1. get reasonably good results quickly(seconds)
    • Through a series of experiments varying the iteration number and learning rate, I observed that when the iteration number was set to 1000 and the learning rate to 0.1, the loss rapidly decreased and converged to the minimum possible loss. Additionally, during this configuration, I verified that the probabilities of the last token closely matched the optimal probabilities.

Test case ๐Ÿ“Œ

  • Case 1 (Recommended)
    • Test
      # set number of iterations and learning rate
      num_iterations = 1000  # SET THIS
      learning_rate = 0.1  # SET THIS
    • Result Plot Plot_1000_0 1

  • Case 2
    • Test
      # set number of iterations and learning rate
      num_iterations = 1000  # SET THIS
      learning_rate = 0.5  # SET THIS
    • Result Plot Plot_1000_0 5

  • Case 3
    • Test
      # set number of iterations and learning rate
      num_iterations = 1000  # SET THIS
      learning_rate = 0.01  # SET THIS
    • Result Plot Plot_1000_0 01

  • Case 4
    • Test
      # set number of iterations and learning rate
      num_iterations = 500  # SET THIS
      learning_rate = 0.5  # SET THIS
    • Result Plot Plot_500_0 5

  • Case 5
    • Test
      # set number of iterations and learning rate
      num_iterations = 500  # SET THIS
      learning_rate = 0.1  # SET THIS
    • Result Plot Plot_500_0 1

  • Case 6
    • Test
      # set number of iterations and learning rate
      num_iterations = 500  # SET THIS
      learning_rate = 0.01  # SET THIS
    • Result Plot Plot_500_0 01

  • Case 7
    • Test
      # set number of iterations and learning rate
      num_iterations = 100  # SET THIS
      learning_rate = 0.5  # SET THIS
    • Result Plot Plot_100_0 5

  • Case 8
    • Test
      # set number of iterations and learning rate
      num_iterations = 100  # SET THIS
      learning_rate = 0.1  # SET THIS
    • Result Plot Plot_100_0 1

  • Case 9 (Recommended)
    • Test
      # set number of iterations and learning rate
      num_iterations = 100  # SET THIS
      learning_rate = 0.01  # SET THIS
    • Result Plot Plot_100_0 01

gradient-descent's People

Contributors

suim-park avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.