deepgenerativemodels / notes Goto Github PK

Course notes

License: MIT License

HTML 5.18% Ruby 6.19% CSS 60.29% Makefile 0.39% TeX 27.96%

notes's Introduction

Notes on Deep Generative Models

These notes form a concise introductory course on deep generative models. They are based on Stanford CS236, taught by Aditya Grover and Stefano Ermon, and have been written by Aditya Grover, with the help of many students and course staff.

The compiled version is available here.

Contributing

This material is under construction! Although we have written up most of it, you will probably find several typos. If you do, please let us know, or submit a pull request with your fixes via Github.

The notes are written in Markdown and are compiled into HTML using Jekyll. Please add your changes directly to the Markdown source code. In order to install jekyll, you can follow the instructions posted on their website (https://jekyllrb.com/docs/installation/).

Note that jekyll is only supported on GNU/Linux, Unix, or macOS. Thus, if you run Windows 10 on your local machine, you will have to install Bash on Ubuntu on Windows. Windows gives instructions on how to do that here and Jekyll's website offers helpful instructions on how to proceed through the rest of the process.

To compile Markdown to HTML (i.e. after you have made changes to markdown and want them to be accessible to students viewing the docs), run the following commands from the root of your cloned version of the https://github.com/deepgenerativemodels/notes repo:

rm -r docs/
jekyll serve # This should create a folder called _site. Note: This creates a running server; press Ctrl-C to stop the server before proceeding
mv _site docs # Change the name of the _site folder to "docs". This won't work if the server is still running.
git add file_names
git commit -am "your commit message describing what you did"
git push origin master

Note that if you cloned the ermongroup/cs228-notes repo directly onto your local machine (instead of forking it) then you may see an error like "remote: Permission to ermongroup/cs228-notes.git denied to userjanedoe". If that is the case, then you need to fork their repo first. Then, if your github profile were userjanedoe, you would need to first push your local updates to your forked repo like so:

git push https://github.com/deepgenerativemodels/notes.git master

And then you could go and submit the pull request through the GitHub website.

notes's People

Contributors

Stargazers

Watchers

Forkers

ruishu machinelp kristychoi jiamings zaidnabulsi andreapi fendaq allensmile andrewk1 wowml mehrad0711 polimorfo scheeloong acganesh vandith yanxiangyi stevenfyy georgosgeorgos kelvinson cqduan hhy5277 tanmdl jmsung nav13n kydonian nikkibytes cooper-mj josegironn loodvn msalvato circlemarker chloroaqua daylen kitliu5 cj401 dragomirradev jonathanjmak aviralksingh habibmrad cwickniss chuanqichen zhaosongyi kulkarnisp johnreid pierrenowi mbrukman happygaoxiao data-science-ai-open-source vin136 weiqiao geophysics-ai sniper-lai varuni-d freedreamer-crypto sumanyu-21 blipblipgo youweiliang pablogonzalezginestet sean0719 lucazanottifragonara akbek purushothamansrikanth ai-hub-deep-learning-fundamental minedeep burantiar harry-zhou drugintelligence liufengbrain robz phenometry anhnguyendepocen spyroot towardsautonomy mriedman barvin04 amirziai juampamuc ahmedhshahin spatialxia gkutiel gauravmunjal13 ivillar grandeep wedodo55 mrzzy2021 yijiejin xrfxlp 3ephn3zxdr5c stanleyjacob salman-lui arshahin jaedukseo indigoyeoma ml-edu eternalalloy wangyuanhao neerajgoel82 uaicfs jidhnyasa aimeddrug

notes's Issues

Some typos in https://deepgenerativemodels.github.io/notes/autoregressive/

The number of parameters needed to specify a table of a Bayesian network

This is from Autoregressive Models Chapter

To see why, let us consider the conditional for the last dimension, given by $p(x_n|x_{\lt n})$. In order to fully specify this conditional, we need to specify a probability for $2^{n−1}$ configurations of the variables $x_1,x_2,\ldots,x_{n−1}$. Since the probabilities should sum to $1$, the total number of parameters for specifying this conditional is given by $2^{n−1}−1$. Hence, a tabular representation for the conditionals is impractical for learning the joint distribution factorized via chain rule.

Shouldn't it be $2^{n-1}$ instead of $2^{n-1}-1$ here ? Why minus $1$ ? In my understanding, the $n$-th random variable is dependent on $n-1$ random variables, in binary case, there should be $2^{n-1}$ rows in the table. In every single row, the entries should add up to $1$, so only one of the two entries in this row needs specifying. Thus one parameter for each row, it should be $2^{n-1}$.

Are there any way to watch videos for non Stanford students?

Latest compiled HTML is inconsistent with markdown

The compiled HTML that is live here (https://deepgenerativemodels.github.io/notes/introduction/) still contains errors like "our goal is to learn the paraemeters" (misspelling), but this is not present in the markdown. We should compile the markdown to update this.

Confused by λ(i)

In the "Black-Box Variational Inference" section of the VAE notes:

We first do per-sample optimization of q by iteratively applying the update
λ(i) ← λ(i) + ∇̃ λ ELBO(x(i); θ, λ(i))
We then perform a single update step based on the mini-batch
θ ← θ + ∇̃ θ ∑i ELBO(x(i); θ, λ(i))

If I understood correctly, x(i) is the ith sample from a batch B of the dataset D, and λ is a vector of parameters of the distribution "qλ(z)". What is λ(i)?

Is it the ith parameter of λ? That would imply that the length of B is equal to the dimension of the λ--if so, it's unclear to me why they would be equal.

Another possibility is that λ(i) is the ith update to λ. If so, perhaps it would be better rewritten like this:

λ(i+1) ← λ(i) + ∇̃ λ ELBO(x(i); θ, λ(i))

But if that's the case, then it's unclear to me why it appears in the θ update:

θ ← θ + ∇̃ θ ∑i ELBO(x(i); θ, λ(i))

Apologies if I've missed something obvious here. Also, thanks for notes--they've been very helpful!

Confused by qλ(z)

Reading the following paragraphs in the VAE notes:

Next, we introduce a variational family Q of distributions that approximate the true, but intractable posterior p(z∣x). Further henceforth, we will assume a parameteric setting where any distribution in the model family Px,z is specified via a set of parameters θ∈Θ and distributions in the variational family Q are specified via a set of parameters λ∈Λ.
Given Px,z and Q, we note that the following relationships hold true1 for any x and all variational distributions qλ(z)∈Q

If "qλ(z)" is intended to approximate the distribution "p(z∣x)", then I'm confused as to why "qλ(z)" doesn't include "x". Should it be "qλ(z∣x)", or it is actually approximating the distribution "p(z)"?

Apologies if this sounds like an ignorant question--my understanding of probability notation isn't too sharp.

Sharing homework

I have been watching this course on youtube, and it's really really great. However, I can't find any homeworks on the site. Is it possible for you to share homework assignments?

Thanks

Possible errors on estimating gradient base on REINFORCE

In the variational auto-encoder chapter, the gradient of the encoder is computed as:

However the term also depends on the parameter \lambda. According to the formula (4) in the "Gradient Estimation Using Stochastic Computation Graphs, NIPS 2015"

When estimating the encoder's gradient, the first gradient term is missing according to the equation above

Missing Images in Normalizing Flow

Missing reference in https://deepgenerativemodels.github.io/notes/vae/

In https://deepgenerativemodels.github.io/notes/vae/, paragraph

Learning Directed Latent Variable Models

states that

As we have seen previously, optimizing an empirical estimate of the KL divergence is equivalent to maximizing the marginal log-likelihood logp(x) over $D$

This isn't mentioned anywere in the rest of the course notes, . It would be useful for the learner to add the proof of this equivalence, or at least a reference to it.