diff --git a/README.md b/README.md index 5e7cac89..918449ff 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,8 @@ [![Build Status](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml?query=branch%3Amain) +**Last updated: 2023-Aug-23** + A normalizing flow library for Julia. The purpose of this package is to provide a simple and flexible interface for variational inference (VI) and normalizing flows (NF) for Bayesian computation or generative modeling. @@ -37,11 +39,11 @@ Z_N = T_{N, \theta_N} \circ \cdots \circ T_{1, \theta_1} (Z_0) , \quad Z_0 \sim ``` where $\theta = (\theta_1, \dots, \theta_N)$ is the parameter to be learned, and $q_{\theta}$ is the variational distribution (flow distribution). This describes **sampling procedure** of normalizing flows, which requires sending draws through a forward pass of these flow layers. -Since all the transformations are invertible (techinically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula: +Since all the transformations are invertible (technically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula: ```math q_\theta(x)=\frac{q_0\left(T_1^{-1} \circ \cdots \circ T_N^{-1}(x)\right)}{\prod_{n=1}^N J_n\left(T_n^{-1} \circ \cdots \circ -T_N^{-1}(x)\right)} \quad J_n(x)=\left|\operatorname{det} \nabla_x +T_N^{-1}(x)\right)} \quad J_n(x)=\left|\text{det} \nabla_x T_n(x)\right|. ``` Here we drop the subscript $\theta_n, n = 1, \dots, N$ for simplicity. @@ -52,17 +54,17 @@ Given the feasibility of i.i.d. sampling and density evaluation, normalizing flo ```math \begin{aligned} \text{Reverse KL:}\quad -&\argmin _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\ -&= \argmin _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\ -&= \argmax _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right] +&\arg\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\ +&= \arg\min _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\ +&= \arg\max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right] \end{aligned} ``` and ```math \begin{aligned} \text{Forward KL:}\quad -&\argmin _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\ -&= \argmin _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right] +&\arg\min _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\ +&= \arg\min _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right] \end{aligned} ``` Both problems can be solved via standard stochastic optimization algorithms, @@ -71,14 +73,17 @@ such as stochastic gradient descent (SGD) and its variants. Reverse KL minimization is typically used for **Bayesian computation**, where one wants to approximate a posterior distribution $p$ that is only known up to a normalizing constant. -In contrast, forward KL minimization is typically used for **generative modeling**, where one wants to approximate a complex distribution $p$ that is known up to a normalizing constant. +In contrast, forward KL minimization is typically used for **generative modeling**, +where one wants to learn the underlying distribution of some data. ## Current status and TODOs - [x] general interface development - [x] documentation -- [ ] including more flow examples +- [ ] including more NF examples/Tutorials + - WIP: [PR#11](https://github.com/TuringLang/NormalizingFlows.jl/pull/11) - [ ] GPU compatibility + - WIP: [PR#25](https://github.com/TuringLang/NormalizingFlows.jl/pull/25) - [ ] benchmarking ## Related packages @@ -86,5 +91,3 @@ In contrast, forward KL minimization is typically used for **generative modeling - [Flux.jl](https://fluxml.ai/Flux.jl/stable/) - [Optimisers.jl](https://github.com/FluxML/Optimisers.jl) - [AdvancedVI.jl](https://github.com/TuringLang/AdvancedVI.jl) - -