Giter Site home page Giter Site logo

hands-on-gradients-derivation-for-ml-dl-loss-func's Introduction

Step-by-step deatiled gradients derivations for common supervised machine learning and deep learning loss functions, may be suitable for people who just started learning machine learning and deep learning. The notations in the field can vary widely from person to person or change over time, but the notations here are consistent and easy.

Currently, I have derived the gradients of the loss functions for linear regression and logistic regression using the following set of notations. To see how gradients can be derived in a deep learning nerual network setting, you can check my Notes-for-Stanford-CS224N-NLP-with-Deep-Learning, although the notations there are different from those used here:

In the future, I will use the following notations to do the gradients derivations for nerual networks too so that there can be a very unified hands-on tutorials for common supervised machine learning and deep learning loss functions. Keep learning!

Notations

  • : the input variables (features).
  • : the true output variables that we want to predict (observations);
  • : the predicted values;
  • : weights for the input variables.
  • : bais term for the input variables. In some machine learning courses and tutorials, people may use the theta sign to reprensent both the weights and the bais term, which seem to be much more popular notations to derive basic machine learning loss fuction gradients, such as linear regression and the logistic regression. But in deep learning,the and separation seems to be more common.
  • The bold font is for vector, is a vector of , and is a vector of . And so on. But please note that, is for all the variables we have for any given single training example, but is for all the training examples. This is because the obervation to predict is always a single fixed value, regardless of being a discrete class (for classification), or a continuous value (for regression).
  • The uppercase plus a bold font denotes matrix, such as , etc.
  • always denotes the number of the training examples, so for . The subscript is always related to .
  • always means the number of input variables for any given training example, so for . The subscript is always related to . In classification problems, the subscript instead stands for the index where the true class is.
  • When two subscripts are used together, is always before , such that means the th input variable in the th training example. Thus, for and for

hands-on-gradients-derivation-for-ml-dl-loss-func's People

Contributors

jaaack-wang avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.