Replies: 17 comments 8 replies
-
Hi @itsdfish! I will have another look this week; as for now, I must say that your problem involves several issues we are currently dealing with. We plan to add hierarchical prior for the neural networks, such as flows, etc. If you propose a model and you run into issues, we will gladly help. We don't have many resources to work closer on this one for now. I will keep an issue so someone can pick it up if time permits. |
Beta Was this translation helpful? Give feedback.
-
No problem. I completely understand. I appreciate any time you are willing to devote. In the meantime, if a non-hierarchical version is feasible, that would be helpful too. Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Hi @itsdfish! The thing is that from the x ~ MvNormal(priors)
A ~ MatrixNormal(M, U, V)
y = A*x
z = Flow(y) In this sketch, you can think of |
Beta Was this translation helpful? Give feedback.
-
Hey @itsdfish ! Just a quick update. Some developments on You can then combine it with Flow and other linearities through Delta factors. xₜ₋₁ ~ MvNormalMeanCovariance(μxₜ₋₁, Σxₜ₋₁)
Λ ~ Wishart(nΛ, ΣΛ)
h ~ MvNormalMeanCovariance(μh, Σh)
xₜ ~ ContinuousTransition(xₜ₋₁, h, Λ) where {meta = CTMeta(in_dim, out_dim)}
yₜ ~ MvNormalMeanCovariance(B*xₜ, Q) Note that we pass As was said previously, you can use this node with The main issue now is computational (if your problem works with 10-dimensional vectors then the covariance for |
Beta Was this translation helpful? Give feedback.
-
@albertpod, thank you for the update! This looks really interesting. I upgraded my working example to the For example, I'm not quite sure how to generate the training data in In the function |
Beta Was this translation helpful? Give feedback.
-
@itsdfish, so according to your generative model, you construct a latent space that consists of Let's start simple (flow aside). Let's discuss the generative model (generate data function). What are your observations? Then we will construct a probabilistic model. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your help with this. I agree that starting with the generative model is a good strategy. You are correct: the data generating model produces a choice and a reaction time from a single simulation. To make things simple, I propose setting a prior on two of the parameters: alpha, which is the decision threshold, and tau, which is an additive constant for visual encoding and motor execution time.
I'll assume the other parameters are deterministic. This leaves us with the following distribution object for the LCA:
The function
In summary, the data generating model outputs a choice and a reaction time. In a simple form of the model, we could have a prior on alpha and tau. |
Beta Was this translation helpful? Give feedback.
-
Thanks @itsdfish! I have a better feeling about what you are trying to achieve. So as far as I understand, you want to approximate the likelihood induced by LCA. I am no specialist on intractable likelihoods and their approximations, so my proposed solution might be different from what you need. First of all, I would say that you need to take out Reactive. forward(...) from your data generation function. It appears that the inference problem here is ατconcat(α, τ) = vcat(α, τ) # we need this function to combine α and τ parameters and push it through the Flow
@model function invertible_neural_network(nr_samples::Int64, model)
# initialize variables
x = randomvar(nr_samples)
y_lat = randomvar(nr_samples)
y = datavar(Vector{Float64}, nr_samples)
# specify prior
α ~ Normal(μ=3, σ²=10.0)
τ ~ Normal(μ=10, σ²=10.0)
z_μ ~ ατconcat(α, τ) where {meta=Linearization()}
z_Λ ~ Wishart(1e2, 1e4*diageye(2))
# specify observations
for i in 1:nr_samples
# specify latent state
x[i] ~ MvNormal(μ=z_μ, Λ=z_Λ)
# specify transformed latent value
y_lat[i] ~ Flow(x[i]) where {meta=FlowMeta(model)}
# specify observations
y[i] ~ MvNormal(μ=y_lat[i], Σ=tiny*diageye(2))
end
end;
y = [[choice[i], rt[i]] for i in 1:length(choice)]
data = (y = y, )
constraints = @constraints begin
q(z_μ, x, z_Λ) = q(z_μ)q(z_Λ)q(x)
end
fmodel = invertible_neural_network(length(y), compiled_model)
initmarginals = (z_μ = MvNormalMeanCovariance(zeros(2), huge*diageye(2)), z_Λ = Wishart(2.0, diageye(2)))
# Inference routine This is something to start with, and we can build up on top of it. Remember to experiment with the flow model itself. BTW I don't think you need |
Beta Was this translation helpful? Give feedback.
-
You are correct. The inference problem is Thanks the guidance with I was hoping to ask a few follow up questions too. First, I was wondering what the training data should be. Currently, Also, I was wondering whether you can recommend a resource for these types of models aimed at non-experts. I have been reading various sources, but it can be challenging to get a good understanding from the journal articles because they typically assume the reader is already an expert. |
Beta Was this translation helpful? Give feedback.
-
Hi @itsdfish! I'm sorry for not getting back to you sooner.
Neither Normal nor Beta are good choices in this case. You'd better use Here's an example. @model function invertible_neural_network(nr_samples::Int64, model)
# initialize variables
x = randomvar(nr_samples)
y_lat = randomvar(nr_samples)
y = datavar(Vector{Float64}, nr_samples)
# specify prior
α ~ GammaShapeScale(100.0, 0.01)
τ ~ GammaShapeScale(100.0, 0.01)
z_μ ~ ατconcat(α, τ) where {meta=CVI(StableRNG(42), 100, 200, Optimisers.Descent(0.1), RxInfer.ForwardDiffGrad(), 100, Val(true), true)}
z_Λ ~ Wishart(3, diageye(2))
# specify observations
for i in 1:nr_samples
# specify latent state
x[i] ~ MvNormal(μ=z_μ, Λ=z_Λ)
# specify transformed latent value
y_lat[i] ~ Flow(x[i]) where {meta=FlowMeta(model)}
# specify observations
y[i] ~ MvNormal(μ=y_lat[i], Σ=tiny*diageye(2))
end
# return variables
return z_μ, z_Λ, x, y_lat, y
end; I know CVI meta looks horrifying, see this example to understand more. Again, your RxInfer model should represent your belief of the data generation process. If alpha and tau are positive, then we can put a Gamma prior on top of it.
I am not sure I understand the question here. Didn't you say we must infer the posterior over alpha and tau?
Do you mean invertible nets? The way they work in factor graphs (engine behind RxInfer) can be found here: I will try to allocate some time to read that paper of BayesFlow, seems interesting. There are people in the lab who are working INNs but they are on holidays. Perhaps they will be more helpful than I am ;) |
Beta Was this translation helpful? Give feedback.
-
@albertpod, that sounds like a good plan. In the meantime, I will re-read that paper so I can try to understand as much as possible. I appreciate your willingness to look at the paper, and help develop a working example. Just for your situation awareness, I have been updating the code in the link in the original post, reproduced here: https://github.com/itsdfish/RxInferSandbox.jl/blob/main/examples/lca_example.jl |
Beta Was this translation helpful? Give feedback.
-
@itsdfish, a short follow-up, I have finished the paper on BayesFlow. Perhaps we will have some time in July to implement some models. I suggest you start with simpler examples the authors highlight in the paper. |
Beta Was this translation helpful? Give feedback.
-
@albertpod, that sounds great. Although a simple Gaussian model was not in the paper, I was thinking it might be a good starting point for a 1D model. For example,
The multivariate Gaussian would be a logical progression to a 2D model. Please let me know how I can help. Perhaps I can set up some scripts in the repo for both models and we can modify them as needed. |
Beta Was this translation helpful? Give feedback.
-
Sure, @itsdfish. With regard to
Do you have questions on how to run inference in this model with |
Beta Was this translation helpful? Give feedback.
-
I understand the algorithm on page 9 at a high level, but I am not sure about implementational details of the code using RxInfer. There are at least four things I do not fully understand.
There are a lot of moving pieces. I think I would need to see a working example in order to understand how it all fits together. From there, I might be able to apply the approach to different models. |
Beta Was this translation helpful? Give feedback.
-
Maybe it would be beneficial to start with the training data. My understanding is that the summary network learns summary statistics from datasets with a variable number of observations. The conditional invertible network needs to know the prior samples for mu and sigma. This function generates both: https://github.com/itsdfish/RxInferSandbox.jl/blob/0c2826df694f660c08c6c2052fb91900e78fec85/examples/guassian_example.jl#L26 What is less clear to me is how the data should be structured. The parameters are in a n X 2 matrix where each row is a different sample of mu and sigma. The simulated data are currently in a vector of n vectors of variable length. I'm fairly certain that is not the right structure, but I'm not sure how else to deal with variable length vectors. My best guess is that datasets of the same size are batched together, but that leaves the question of how to handle variable input size in the summary network. |
Beta Was this translation helpful? Give feedback.
-
Hi @itsdfish. Sorry, my replies are taking too long; it's busy up here. How about we first focus on the toy example from the paper, Proof of Concept: Multivariate Normal Distribution? function generate_data(n_samples, dim, Σ=diageye(dim))
μ = MvNormal(zeros(dim), diageye(dim))
[rand(MvNormal(rand(μ), Σ)) for _ in 1:n_samples]
end The first thing we must do following this paper is to learn summary statistics. As far as I understand, those would be just the posteriors from the example on INN from RxInfer, i.e. @model function invertible_neural_network(nr_samples::Int64)
# initialize variables
z_μ = randomvar()
z_Λ = randomvar()
x = randomvar(nr_samples)
y_lat = randomvar(nr_samples)
y = datavar(Vector{Float64}, nr_samples)
# specify prior
z_μ ~ MvNormalMeanCovariance(zeros(2), diagm(ones(2)))
z_Λ ~ Wishart(2.0, diagm(ones(2)))
# specify observations
for k = 1:nr_samples
# specify latent state
x[k] ~ MvNormalMeanPrecision(z_μ, z_Λ)
# specify transformed latent value
y_lat[k] ~ Flow(x[k])
# specify observations
y[k] ~ MvNormalMeanCovariance(y_lat[k], diagm(ones(2)))
end
# return variables
return z_μ, z_Λ, x, y_lat, y
end; Just so you know, for this particular example, we don't need the invertible neural network, aka Flow. It's just for demonstration reasons.
Would you agree with that? |
Beta Was this translation helpful? Give feedback.
-
Hello,
Thank you for putting together this package and the accompanying documentation. This package looks very nice!
I am moving a discussion with Dmitry and Abert from the Turing Slack channel to here upon their request. As some brief background, I am interested in performing amortized Bayesian parameter estimation similar to the Python package BayesFlow. Performing Bayesian parameter estimation with many of the models I work with is challenging because they do not have a closed-form likelihood function. Having capabilities similar to BayesFlow would be very useful. I suspect the broader Julia community would find it useful as well.
My ultimate goal is to perform amortized Bayesian parameter estimation on models with the following characteristics:
In addition, it would be great to have the ability to save and reload a trained neural network, and to obtain full posterior distributions given a dataset with any number of observations
This example from BayesFlow is a good starting point which captures these criteria quite well. The example uses the leaky competing accumulator (LCA), a multi-alternative decision making model based on a stochastic differential equation. The basic idea is that evidence for different options (items from a restaurant menu) acculates towards a threshold. The winner of this race determines the decision and the response time, leading to a bivariate distribution with one discrete variable and one continuous variable.
Dmitry and Abert indicated that RxInfer has the capabilities listed above and were nice enough to offer help putting together an LCA example similar to the one from BayesFlow. If this is something that you find useful, I would be willing to open a PR to add to your list of examples. I started by studying the invertible neural network example. Machine learning is not my area of expertise. So some details are a little unclear to me. In this repo, I adapted the function
generate_data
to the LCA and the code runs. However, I think the neural network architecture needs to be modified. Currently, the training data are generated from a fixed set of parameters rather than samples from prior distributions. So I am guessing it cannot produce posterior distributions over parameters in its current form. I'm not sure where to start. Any help you would be willing to provide would be helpful.Beta Was this translation helpful? Give feedback.
All reactions