Giter Site home page Giter Site logo

notes's Introduction

Notes on Deep Generative Models

These notes form a concise introductory course on deep generative models. They are based on Stanford CS236, taught by Aditya Grover and Stefano Ermon, and have been written by Aditya Grover, with the help of many students and course staff.

The compiled version is available here.

Contributing

This material is under construction! Although we have written up most of it, you will probably find several typos. If you do, please let us know, or submit a pull request with your fixes via Github.

The notes are written in Markdown and are compiled into HTML using Jekyll. Please add your changes directly to the Markdown source code. In order to install jekyll, you can follow the instructions posted on their website (https://jekyllrb.com/docs/installation/).

Note that jekyll is only supported on GNU/Linux, Unix, or macOS. Thus, if you run Windows 10 on your local machine, you will have to install Bash on Ubuntu on Windows. Windows gives instructions on how to do that here and Jekyll's website offers helpful instructions on how to proceed through the rest of the process.

To compile Markdown to HTML (i.e. after you have made changes to markdown and want them to be accessible to students viewing the docs), run the following commands from the root of your cloned version of the https://github.com/deepgenerativemodels/notes repo:

  1. rm -r docs/
  2. jekyll serve # This should create a folder called _site. Note: This creates a running server; press Ctrl-C to stop the server before proceeding
  3. mv _site docs # Change the name of the _site folder to "docs". This won't work if the server is still running.
  4. git add file_names
  5. git commit -am "your commit message describing what you did"
  6. git push origin master

Note that if you cloned the ermongroup/cs228-notes repo directly onto your local machine (instead of forking it) then you may see an error like "remote: Permission to ermongroup/cs228-notes.git denied to userjanedoe". If that is the case, then you need to fork their repo first. Then, if your github profile were userjanedoe, you would need to first push your local updates to your forked repo like so:

git push https://github.com/deepgenerativemodels/notes.git master

And then you could go and submit the pull request through the GitHub website.

notes's People

Contributors

ademiadeniji avatar aditya-grover avatar andreapi avatar andrewk1 avatar circlemarker avatar colasgael avatar ermonste avatar jiamings avatar josegironn avatar kitliu5 avatar kristychoi avatar loodvn avatar msalvato avatar nhonka avatar ruishu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

notes's Issues

The number of parameters needed to specify a table of a Bayesian network

This is from Autoregressive Models Chapter

To see why, let us consider the conditional for the last dimension, given by $p(x_n|x_{\lt n})$. In order to fully specify this conditional, we need to specify a probability for $2^{n−1}$ configurations of the variables $x_1,x_2,\ldots,x_{n−1}$. Since the probabilities should sum to $1$, the total number of parameters for specifying this conditional is given by $2^{n−1}−1$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.

Shouldn't it be $2^{n-1}$ instead of $2^{n-1}-1$ here ? Why minus $1$ ? In my understanding, the $n$-th random variable is dependent on $n-1$ random variables, in binary case, there should be $2^{n-1}$ rows in the table. In every single row, the entries should add up to $1$, so only one of the two entries in this row needs specifying. Thus one parameter for each row, it should be $2^{n-1}$.

Confused by λ(i)

In the "Black-Box Variational Inference" section of the VAE notes:

We first do per-sample optimization of q by iteratively applying the update
λ(i)λ(i) + ∇̃ λ ELBO(x(i); θ, λ(i))
We then perform a single update step based on the mini-batch
θ ← θ + ∇̃ θ ∑i ELBO(x(i); θ, λ(i))

If I understood correctly, x(i) is the ith sample from a batch B of the dataset D, and λ is a vector of parameters of the distribution "qλ(z)". What is λ(i)?

Is it the ith parameter of λ? That would imply that the length of B is equal to the dimension of the λ--if so, it's unclear to me why they would be equal.

Another possibility is that λ(i) is the ith update to λ. If so, perhaps it would be better rewritten like this:

λ(i+1) ← λ(i) + ∇̃ λ ELBO(x(i); θ, λ(i))

But if that's the case, then it's unclear to me why it appears in the θ update:

θ ← θ + ∇̃ θ ∑i ELBO(x(i); θ, λ(i))

Apologies if I've missed something obvious here. Also, thanks for notes--they've been very helpful!

Confused by qλ(z)

Reading the following paragraphs in the VAE notes:

Next, we introduce a variational family Q of distributions that approximate the true, but intractable posterior p(z∣x). Further henceforth, we will assume a parameteric setting where any distribution in the model family Px,z is specified via a set of parameters θ∈Θ and distributions in the variational family Q are specified via a set of parameters λ∈Λ.
Given Px,z and Q, we note that the following relationships hold true1 for any x and all variational distributions qλ(z)∈Q

If "qλ(z)" is intended to approximate the distribution "p(z∣x)", then I'm confused as to why "qλ(z)" doesn't include "x". Should it be "qλ(z∣x)", or it is actually approximating the distribution "p(z)"?

Apologies if this sounds like an ignorant question--my understanding of probability notation isn't too sharp.

Sharing homework

I have been watching this course on youtube, and it's really really great. However, I can't find any homeworks on the site. Is it possible for you to share homework assignments?

Thanks

Possible errors on estimating gradient base on REINFORCE

In the variational auto-encoder chapter, the gradient of the encoder is computed as:
image

However the term image also depends on the parameter \lambda. According to the formula (4) in the "Gradient Estimation Using Stochastic Computation Graphs, NIPS 2015"
image

When estimating the encoder's gradient, the first gradient term is missing according to the equation above

Missing reference in https://deepgenerativemodels.github.io/notes/vae/

In https://deepgenerativemodels.github.io/notes/vae/, paragraph

Learning Directed Latent Variable Models

states that

As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood logp(x) over $D$

This isn't mentioned anywere in the rest of the course notes, . It would be useful for the learner to add the proof of this equivalence, or at least a reference to it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.