Amortized Bayesian parameter estimation example for the leaky competing accumulator #134

itsdfish · 2023-06-05T23:09:08Z

itsdfish
Jun 5, 2023

Hello,

Thank you for putting together this package and the accompanying documentation. This package looks very nice!

I am moving a discussion with Dmitry and Abert from the Turing Slack channel to here upon their request. As some brief background, I am interested in performing amortized Bayesian parameter estimation similar to the Python package BayesFlow. Performing Bayesian parameter estimation with many of the models I work with is challenging because they do not have a closed-form likelihood function. Having capabilities similar to BayesFlow would be very useful. I suspect the broader Julia community would find it useful as well.

My ultimate goal is to perform amortized Bayesian parameter estimation on models with the following characteristics:

simulation-based models without a tractable likelihood function
prior distribution over multiple parameters (potentially with a hierarchical structure)
multivariate data with mixed discrete and continuous dimensions

In addition, it would be great to have the ability to save and reload a trained neural network, and to obtain full posterior distributions given a dataset with any number of observations

This example from BayesFlow is a good starting point which captures these criteria quite well. The example uses the leaky competing accumulator (LCA), a multi-alternative decision making model based on a stochastic differential equation. The basic idea is that evidence for different options (items from a restaurant menu) acculates towards a threshold. The winner of this race determines the decision and the response time, leading to a bivariate distribution with one discrete variable and one continuous variable.

Dmitry and Abert indicated that RxInfer has the capabilities listed above and were nice enough to offer help putting together an LCA example similar to the one from BayesFlow. If this is something that you find useful, I would be willing to open a PR to add to your list of examples. I started by studying the invertible neural network example. Machine learning is not my area of expertise. So some details are a little unclear to me. In this repo, I adapted the function generate_data to the LCA and the code runs. However, I think the neural network architecture needs to be modified. Currently, the training data are generated from a fixed set of parameters rather than samples from prior distributions. So I am guessing it cannot produce posterior distributions over parameters in its current form. I'm not sure where to start. Any help you would be willing to provide would be helpful.

albertpod · 2023-06-06T11:29:26Z

albertpod
Jun 6, 2023
Maintainer

Hi @itsdfish! I will have another look this week; as for now, I must say that your problem involves several issues we are currently dealing with. We plan to add hierarchical prior for the neural networks, such as flows, etc.
The out-of-the-box solution doesn't provide this functionality yet.

If you propose a model and you run into issues, we will gladly help. We don't have many resources to work closer on this one for now.

I will keep an issue so someone can pick it up if time permits.

0 replies

itsdfish · 2023-06-06T13:29:11Z

itsdfish
Jun 6, 2023
Author

No problem. I completely understand. I appreciate any time you are willing to devote.

In the meantime, if a non-hierarchical version is feasible, that would be helpful too. Thanks again!

0 replies

albertpod · 2023-06-07T08:39:25Z

albertpod
Jun 7, 2023
Maintainer

Hi @itsdfish! The thing is that from the RxInfer.jl perspective adding hierarchical priors is easy once the node, such as an invertible network, accepts parameters as distributions. Unfortunately, it's not the case for now. In about one-two week, I plan to make a PR that will allow us to parameterize the invertible networks. It might still not be what you are looking for, but the idea is smth like this:

x ~ MvNormal(priors)
A ~ MatrixNormal(M, U, V)
y = A*x
z = Flow(y)

In this sketch, you can think of A as the parameters of an invertible neural network.

0 replies

albertpod · 2023-06-16T11:42:44Z

albertpod
Jun 16, 2023
Maintainer

Hey @itsdfish ! Just a quick update.

Some developments on ReactiveMP.jl branch called transfominator might be helpful for you. It adds a ContiniousTransition node with a couple of aliases, CTransition and Transfominator. The node allows performing inference for linear transformations of a multivariate Gaussian y=Hx with unknown matrix H.

You can then combine it with Flow and other linearities through Delta factors.

    xₜ₋₁ ~ MvNormalMeanCovariance(μxₜ₋₁, Σxₜ₋₁)
	
    Λ ~ Wishart(nΛ, ΣΛ)

    h ~ MvNormalMeanCovariance(μh, Σh)
    
    xₜ ~ ContinuousTransition(xₜ₋₁, h, Λ) where {meta = CTMeta(in_dim, out_dim)}
    
    yₜ ~ MvNormalMeanCovariance(B*xₜ, Q)

Note that we pass h as a vectorized form of a matrix normal distribution to ContinuousTransition. So the dimensionality of h is in_dim*out_dim where in_dim and out_dim - are the dimensionality of xₜ₋₁ and xₜ respectively.

As was said previously, you can use this node with Flow or DeltaNode. You can stack them together one by one. Additionally, you can put a prior on the parameters of h as well, for example, InverseWishart for the covariance and random walk or autoregressive prior for h.

The main issue now is computational (if your problem works with 10-dimensional vectors then the covariance for h becomes 100x100 which won't be fast), that's why merging this functionality to the master branch will take a while.

0 replies

itsdfish · 2023-06-16T12:50:15Z

itsdfish
Jun 16, 2023
Author

@albertpod, thank you for the update! This looks really interesting. I upgraded my working example to the transfominator branch and started to update the function invertible_neural_network based on your description above. It became clear that there are some gaps in my understanding of the new workflow.

For example, I'm not quite sure how to generate the training data in generate_data for priors over two parameter alpha and tau. Should I add an outer loop that samples from the priors of alpha and tau like so? https://github.com/itsdfish/RxInferSandbox.jl/blob/16738534c2177891fde2bcc6bd4e19bf3e20eafd/examples/lca_example.jl#L33

In the function invertible_neural_network, I am not sure how to define μh and Σh in h ~ MvNormalMeanCovariance(μh, Σh). I think there are other factors that will depend on the form of the training data. I was wondering whether you might be able to provide some guidance or add a PR to the repo above so that we have a working example (which could of course be added to your documentation)?

0 replies

albertpod · 2023-06-16T14:13:56Z

albertpod
Jun 16, 2023
Maintainer

@itsdfish, so according to your generative model, you construct a latent space that consists of choice and rt, then you pass it through a likelihood ReactiveMP.forward() function, which will constitute your observations. Is that what you really want? Aren't your observations actually the output of your lca function?

Let's start simple (flow aside). Let's discuss the generative model (generate data function). What are your observations? Then we will construct a probabilistic model.

0 replies

itsdfish · 2023-06-16T15:13:53Z

itsdfish
Jun 16, 2023
Author

Thanks for your help with this. I agree that starting with the generative model is a good strategy. You are correct: the data generating model produces a choice and a reaction time from a single simulation.

To make things simple, I propose setting a prior on two of the parameters: alpha, which is the decision threshold, and tau, which is an additive constant for visual encoding and motor execution time.

using Distributions 
using SequentialSamplingModels

# prior sample for decision boundary 
α = rand(Uniform(0, 3))
# prior sample for encoding/motor response time 
τ = rand(Uniform(0, 0.50))

I'll assume the other parameters are deterministic. This leaves us with the following distribution object for the LCA:

# LCA model object with β, λ, ν, Δt and σ fixed, and α, and τ sampled from a prior distribution
dist = LCA(; α, β=0.20, λ=0.10, ν=[2.5,2.0], Δt=.001, τ, σ=1.0)

The function rand will sample a choice (indexed as 1 or 2) and a reaction time. I'm not sure how many samples we should use for each parameter set. So I will just assume 1 sample for now:

# a single sample from the model: choice, rt 
choice, rt = rand(dist)

In summary, the data generating model outputs a choice and a reaction time. In a simple form of the model, we could have a prior on alpha and tau.

0 replies

albertpod · 2023-06-16T17:33:33Z

albertpod
Jun 16, 2023
Maintainer

Thanks @itsdfish! I have a better feeling about what you are trying to achieve. So as far as I understand, you want to approximate the likelihood induced by LCA. I am no specialist on intractable likelihoods and their approximations, so my proposed solution might be different from what you need. First of all, I would say that you need to take out Reactive. forward(...) from your data generation function.

It appears that the inference problem here is $$p(α, τ| y_{1:N})$$ where yᵢ is an i-th pair [choice, rt], the problem here ofc is your intractable likelihood. I would proceed with the simple model like this at first:

ατconcat(α, τ) = vcat(α, τ) # we need this function to combine α and τ parameters and push it through the Flow

@model function invertible_neural_network(nr_samples::Int64, model)
    
    # initialize variables
    x     = randomvar(nr_samples)
    y_lat = randomvar(nr_samples)
    y     = datavar(Vector{Float64}, nr_samples)

    # specify prior
    α   ~ Normal(μ=3, σ²=10.0)
    τ   ~ Normal(μ=10, σ²=10.0)

    z_μ ~ ατconcat(α, τ) where {meta=Linearization()}
    z_Λ ~ Wishart(1e2, 1e4*diageye(2))

    # specify observations
    for i in 1:nr_samples

        # specify latent state
        x[i] ~ MvNormal(μ=z_μ, Λ=z_Λ)

        # specify transformed latent value
        y_lat[i] ~ Flow(x[i]) where {meta=FlowMeta(model)}

        # specify observations
        y[i] ~ MvNormal(μ=y_lat[i], Σ=tiny*diageye(2))

    end

end;

y = [[choice[i], rt[i]] for i in 1:length(choice)]

data = (y = y, )

constraints = @constraints begin
    q(z_μ, x, z_Λ) = q(z_μ)q(z_Λ)q(x)
end

fmodel         = invertible_neural_network(length(y), compiled_model)
initmarginals = (z_μ = MvNormalMeanCovariance(zeros(2), huge*diageye(2)), z_Λ = Wishart(2.0, diageye(2)))

# Inference routine

This is something to start with, and we can build up on top of it. Remember to experiment with the flow model itself.
As the next step, you can try to change the prior for α, τ. For example, you may want to have a Beta prior for α? In this case, you'd need to use CVI instead of Linearization method I used for ατconcat.

BTW I don't think you need CTransition node here; I might mislead you; sorry for this.

0 replies

itsdfish · 2023-06-17T10:38:48Z

itsdfish
Jun 17, 2023
Author

You are correct. The inference problem is p(α,τ | y_1:N), where N might be of any size. A good resource for performing this type of inference is BayesFlow: Learning Complex Stochastic Models With Invertible Neural Networks, where they describe a method for using invertible networks with a summary network.

Thanks the guidance with invertible_neural_network, and no worries about CTransition. I updated my code accordingly. My basic understanding is that this network will allow us to learn the density of the posterior distribution of alpha and tau by starting with a base density (multivariate normal in this case) and performing a series of transformations. Assuming my understanding is correct, I am not sure why I would select a Normal or a Beta in invertible_neural_network. The most important constraint for the LCA is that alpha and tau are positive, but I am not sure how that relates to the prior in invertible_neural_network.

I was hoping to ask a few follow up questions too. First, I was wondering what the training data should be. Currently, generate_data has a "prior" loop where a the prior is sampled, and a "data" loop where samples are drawn from the model given the parameters sampled from the prior distributions. The output is an array in which the first row corresponds to choices and the second row corresponds to reaction times. I was wondering whether the neural network also needs the samples for alpha and tau?

Also, I was wondering whether you can recommend a resource for these types of models aimed at non-experts. I have been reading various sources, but it can be challenging to get a good understanding from the journal articles because they typically assume the reader is already an expert.

0 replies

albertpod · 2023-06-20T12:37:52Z

albertpod
Jun 20, 2023
Maintainer

Hi @itsdfish! I'm sorry for not getting back to you sooner.

I am not sure why I would select a Normal or a Beta in invertible_neural_network. The most important constraint for the LCA is that alpha and tau are positive, but I am not sure how that relates to the prior in invertible_neural_network.

Neither Normal nor Beta are good choices in this case. You'd better use Uniform prior or Gamma prior. Working with Gamma priors is easier in RxInfer atm.

Here's an example.

@model function invertible_neural_network(nr_samples::Int64, model)
    
    # initialize variables
    x     = randomvar(nr_samples)
    y_lat = randomvar(nr_samples)
    y     = datavar(Vector{Float64}, nr_samples)

    # specify prior

    α   ~ GammaShapeScale(100.0, 0.01)
    τ   ~ GammaShapeScale(100.0, 0.01)


    z_μ ~ ατconcat(α, τ) where {meta=CVI(StableRNG(42), 100, 200, Optimisers.Descent(0.1), RxInfer.ForwardDiffGrad(), 100, Val(true), true)}
    z_Λ ~ Wishart(3, diageye(2))

    # specify observations
    for i in 1:nr_samples

        # specify latent state
        x[i] ~ MvNormal(μ=z_μ, Λ=z_Λ)

        # specify transformed latent value
        y_lat[i] ~ Flow(x[i]) where {meta=FlowMeta(model)}

        # specify observations
        y[i] ~ MvNormal(μ=y_lat[i], Σ=tiny*diageye(2))

    end

    # return variables
    return z_μ, z_Λ, x, y_lat, y

end;

I know CVI meta looks horrifying, see this example to understand more.

Again, your RxInfer model should represent your belief of the data generation process. If alpha and tau are positive, then we can put a Gamma prior on top of it.

The output is an array in which the first row corresponds to choices and the second row corresponds to reaction times. I was wondering whether the neural network also needs the samples for alpha and tau?

I am not sure I understand the question here. Didn't you say we must infer the posterior over alpha and tau?

Also, I was wondering whether you can recommend a resource for these types of models aimed at non-experts.

Do you mean invertible nets? The way they work in factor graphs (engine behind RxInfer) can be found here:
https://biaslab.github.io/publication/hybrid-inference-inn/

I will try to allocate some time to read that paper of BayesFlow, seems interesting. There are people in the lab who are working INNs but they are on holidays. Perhaps they will be more helpful than I am ;)

3 replies

itsdfish Jun 20, 2023
Author

No problem! I understand that your time is limited and I have a lot of questions.

Thank you for explaining why Gamma should be used. I have updated the code accordingly.

You are right. The goal is to infer the posterior distribution of alpha and tau. I guess I naive thought that the values of alpha and tau used in generate_data should be included in the training data. Perhaps that is the purpose of the Gamma distributions in invertible_neural_network?

By the way, I don't know if the code is intended to be fully runnable at this point. inference returns an error saying:

┌ Error: Stack overflow error occurred during the inference procedure. 
│ The inference engine may execute message update rules recursively, hence, the model graph size might be causing this error. 
│ To resolve this issue, try using `limit_stack_depth` inference option for model creation. See `?inference` documentation for more details.
│ The `limit_stack_depth` option does not help against over stack overflow errors that might hapenning outside of the model creation or message update rules execution.
└ @ RxInfer ~/.julia/packages/RxInfer/J4Xyy/src/inference.jl:100

albertpod Jun 21, 2023
Maintainer

To help with stackoverflow I would need more from your side, specifically, data, inference calls and model which I assume is the same as I've shared with you.

You are right. The goal is to infer the posterior distribution of alpha and tau. I guess I naive thought that the values of alpha and tau used in generate_data should be included in the training data. Perhaps that is the purpose of the Gamma distributions in invertible_neural_network?

So the example I've shared with you tries to model the observations with the Flow which has alpha and tau inputs. We don't know the inputs, so we assign a prior distribution to these parameters. If you know the inputs, then you need a different approach.

I suggest before we continue, I will really try to give read to that paper you've shared in my free time, so I can provide a more comprehensive solution.

bvdmitri Jul 5, 2023
Maintainer

The error message has a hint, specifically:

To resolve this issue, try using `limit_stack_depth` inference option for model creation. See `?inference` documentation for more details.

itsdfish · 2023-06-21T11:08:12Z

itsdfish
Jun 21, 2023
Author

@albertpod, that sounds like a good plan. In the meantime, I will re-read that paper so I can try to understand as much as possible. I appreciate your willingness to look at the paper, and help develop a working example.

Just for your situation awareness, I have been updating the code in the link in the original post, reproduced here:

https://github.com/itsdfish/RxInferSandbox.jl/blob/main/examples/lca_example.jl

0 replies

albertpod · 2023-07-03T08:43:39Z

albertpod
Jul 3, 2023
Maintainer

@itsdfish, a short follow-up, I have finished the paper on BayesFlow. Perhaps we will have some time in July to implement some models. I suggest you start with simpler examples the authors highlight in the paper.

0 replies

itsdfish · 2023-07-03T19:27:20Z

itsdfish
Jul 3, 2023
Author

@albertpod, that sounds great. Although a simple Gaussian model was not in the paper, I was thinking it might be a good starting point for a 1D model. For example,

μ ~ Normal(0, 1)
σ ~ Gamma(2 1)
y ~ Normal(μ, σ)

The multivariate Gaussian would be a logical progression to a 2D model. Please let me know how I can help. Perhaps I can set up some scripts in the repo for both models and we can modify them as needed.

0 replies

albertpod · 2023-07-05T09:24:19Z

albertpod
Jul 5, 2023
Maintainer

Sure, @itsdfish. With regard to

μ ~ Normal(0, 1)
σ ~ Gamma(2 1)
y ~ Normal(μ, σ)

Do you have questions on how to run inference in this model with RxInfer?

0 replies

itsdfish · 2023-07-05T15:16:24Z

itsdfish
Jul 5, 2023
Author

I understand the algorithm on page 9 at a high level, but I am not sure about implementational details of the code using RxInfer. There are at least four things I do not fully understand.

how to create a summary network
how to create a conditional intertible network ( only see a tutorial for a simple intertible network)
how to fuse or stack them together
What is the shape of the training data for the parameters and simulated data (e.g., lines 10, 11, and 13 in the algorithm of page 9)

There are a lot of moving pieces. I think I would need to see a working example in order to understand how it all fits together. From there, I might be able to apply the approach to different models.

0 replies

itsdfish · 2023-07-05T21:40:34Z

itsdfish
Jul 5, 2023
Author

Maybe it would be beneficial to start with the training data. My understanding is that the summary network learns summary statistics from datasets with a variable number of observations. The conditional invertible network needs to know the prior samples for mu and sigma. This function generates both: https://github.com/itsdfish/RxInferSandbox.jl/blob/0c2826df694f660c08c6c2052fb91900e78fec85/examples/guassian_example.jl#L26

What is less clear to me is how the data should be structured. The parameters are in a n X 2 matrix where each row is a different sample of mu and sigma. The simulated data are currently in a vector of n vectors of variable length. I'm fairly certain that is not the right structure, but I'm not sure how else to deal with variable length vectors. My best guess is that datasets of the same size are batched together, but that leaves the question of how to handle variable input size in the summary network.

0 replies

albertpod · 2023-07-08T11:39:27Z

albertpod
Jul 8, 2023
Maintainer

Hi @itsdfish. Sorry, my replies are taking too long; it's busy up here. How about we first focus on the toy example from the paper, Proof of Concept: Multivariate Normal Distribution?
First, we generate data from this model:

function generate_data(n_samples, dim, Σ=diageye(dim))
    μ = MvNormal(zeros(dim), diageye(dim))
    [rand(MvNormal(rand(μ), Σ)) for _ in 1:n_samples]
end

The first thing we must do following this paper is to learn summary statistics. As far as I understand, those would be just the posteriors from the example on INN from RxInfer, i.e.

@model function invertible_neural_network(nr_samples::Int64)
    
    # initialize variables
    z_μ   = randomvar()
    z_Λ   = randomvar()
    x     = randomvar(nr_samples)
    y_lat = randomvar(nr_samples)
    y     = datavar(Vector{Float64}, nr_samples)

    # specify prior
    z_μ ~ MvNormalMeanCovariance(zeros(2), diagm(ones(2)))
    z_Λ ~ Wishart(2.0, diagm(ones(2)))

    # specify observations
    for k = 1:nr_samples

        # specify latent state
        x[k] ~ MvNormalMeanPrecision(z_μ, z_Λ)

        # specify transformed latent value
        y_lat[k] ~ Flow(x[k])

        # specify observations
        y[k] ~ MvNormalMeanCovariance(y_lat[k], diagm(ones(2)))

    end

    # return variables
    return z_μ, z_Λ, x, y_lat, y

end;

Just so you know, for this particular example, we don't need the invertible neural network, aka Flow. It's just for demonstration reasons.

result.posteriors[:z_μ]
and
result.posteriors[:z_Λ]

Would you agree with that?

5 replies

itsdfish Jul 8, 2023
Author

No problem at all! I appreciate any attention you can devote to this.

Focusing on the multivariate normal example is fine with me. There are two things I do not understand with the code above. First, the samples of mu are not retained for training. Are the samples of mu not needed to train the conditional invertible network? Second, I was wondering why each sample from generate_data has a fixed size of 1 observation. Based on line 3 of the algorithm, I was expecting the observation size to vary uniformly between a min and max value. Otherwise, I am not sure how the summary network can learn the relationship between summary statistics and number of observations.

albertpod Jul 10, 2023
Maintainer

Em, isn't a summary network just about learning the statistics of the data? I thought that associated samples of parameters would be used later.

I must admit that when I dive deeper into the paper, I find it very convoluted.

albertpod Jul 10, 2023
Maintainer

@itsdfish Another thing is that this paper performs an imperative scheme of learning the posterior, i.e., step 1, step 2... It's not how we do it in RxInfer. RxInfer expects you to have a model, then the inference will follow.

itsdfish Jul 10, 2023
Author

I agree that the paper is hard to follow. I thought the difficulty was simply my lack of experience with neural network models. I feel a bit vindicated =)

I found the algorithm to be the most useful for understanding the high level concepts, but it seems that several important details might be absent from the text.

In your opinion, do you think that RxInfer is not the right tool for an algorithm like BayesFlow?

albertpod Jul 11, 2023
Maintainer

I've looked at the codebase of BayesFlow; I couldn't really get through it. I think they have good results, but articulation leaves much to be desired, IMHO.

I think RxInfer.jl can be used for solving problems the authors highlight in the experiments section. The main problem is that the BayesFlow paper itself is about inference. RxInfer.jl does inference in a different way, i.e. reactive message passing. So again, we can solve the problems from the BayesFlow paper's examples, but implementing the BayesFlow algorithm in RxInfer.jl might indeed be not right.

ReactiveBayes

Amortized Bayesian parameter estimation example for the leaky competing accumulator #134

itsdfish Jun 5, 2023

Replies: 17 comments · 8 replies

albertpod Jun 6, 2023 Maintainer

itsdfish Jun 6, 2023 Author

albertpod Jun 7, 2023 Maintainer

albertpod Jun 16, 2023 Maintainer

itsdfish Jun 16, 2023 Author

albertpod Jun 16, 2023 Maintainer

itsdfish Jun 16, 2023 Author

albertpod Jun 16, 2023 Maintainer

itsdfish Jun 17, 2023 Author

albertpod Jun 20, 2023 Maintainer

itsdfish Jun 20, 2023 Author

albertpod Jun 21, 2023 Maintainer

bvdmitri Jul 5, 2023 Maintainer

itsdfish Jun 21, 2023 Author

albertpod Jul 3, 2023 Maintainer

itsdfish Jul 3, 2023 Author

albertpod Jul 5, 2023 Maintainer

itsdfish Jul 5, 2023 Author

itsdfish Jul 5, 2023 Author

albertpod Jul 8, 2023 Maintainer

itsdfish Jul 8, 2023 Author

albertpod Jul 10, 2023 Maintainer

albertpod Jul 10, 2023 Maintainer

itsdfish Jul 10, 2023 Author

albertpod Jul 11, 2023 Maintainer

itsdfish
Jun 5, 2023

Replies: 17 comments 8 replies

albertpod
Jun 6, 2023
Maintainer

itsdfish
Jun 6, 2023
Author

albertpod
Jun 7, 2023
Maintainer

albertpod
Jun 16, 2023
Maintainer

itsdfish
Jun 16, 2023
Author

albertpod
Jun 16, 2023
Maintainer

itsdfish
Jun 16, 2023
Author

albertpod
Jun 16, 2023
Maintainer

itsdfish
Jun 17, 2023
Author

albertpod
Jun 20, 2023
Maintainer

itsdfish Jun 20, 2023
Author

albertpod Jun 21, 2023
Maintainer

bvdmitri Jul 5, 2023
Maintainer

itsdfish
Jun 21, 2023
Author

albertpod
Jul 3, 2023
Maintainer

itsdfish
Jul 3, 2023
Author

albertpod
Jul 5, 2023
Maintainer

itsdfish
Jul 5, 2023
Author

itsdfish
Jul 5, 2023
Author

albertpod
Jul 8, 2023
Maintainer

itsdfish Jul 8, 2023
Author

albertpod Jul 10, 2023
Maintainer

albertpod Jul 10, 2023
Maintainer

itsdfish Jul 10, 2023
Author

albertpod Jul 11, 2023
Maintainer