Online Learning #136

caxelrud · 2024-11-27T13:51:10Z

Is it possible to do Online Learning with LaplaceRedux?
In other words, is it possible to use a previous calculated posterior as the prior of a new evaluation?

pat-alt · 2024-11-28T07:22:54Z

Hi there @caxelrud 👋🏽 since both the prior and posterior are Gaussian, I don't see why this couldn't work. Apparently it's been done before, but I'm not familiar with the details: https://proceedings.neurips.cc/paper_files/paper/2018/file/f31b20466ae89669f9741e047487eb37-Paper.pdf

caxelrud · 2024-11-29T18:07:49Z

Hi!
I am checking now if the code already has this functionality.
Let me know your comments related to the existing code and its functionalities related to this feature.
Regards,

caxelrud · 2024-12-10T04:32:59Z

Hi,
At this point I am interested in training an existing model with more data.
To use a previous calculated posterior as the prior of a new evaluation.
Looking into the documentation, the LaplaceRedux.Posterior Type has:

posterior_mean::AbstractVector: the MAP estimate of the parameters
P::Union{AbstractArray,AbstractDecomposition,Nothing}: the posterior precision matrix

The LaplaceRedux.Prior has:

prior_mean::Real: the prior mean
prior_precision_matrix::Union{Nothing,AbstractMatrix,UniformScaling}: the prior precision matrix

Since the Prior type prior_mean is a scalar, I can't use the posterior_mean vector.
So, let me know your thoughts on how to overcome this limitation.
Thanks!

pat-alt · 2024-12-10T07:45:04Z

The prior_mean and field is only used in prior optimization (optimize_prior) which in the current implementation is done through marginal likelihood maximization. Still, this could be worth addressing in the future (#138).

You can still use the posterior mean as a prior, of course, by using it as a regularizer when training on new data: the Gaussian posterior now acting as your Gaussian prior is equivalent to training with weight decay (see Daxberger (2021) and also here and here). The standard Ridge penalty in Flux corresponds to a zero-mean prior, but that should be straight-forward to adjust in your code. This way the posterior mean will act as a prior affecting your MAP estimate when training on new data.

As for using the posterior precision matrix as your new prior, it's worth noting that prior_precision_matrix actually does enter computations elsewhere (not just in optimize_prior), e.g. here in calculating the posterior precision. So here it is indeed crucial to also supply that value at instantiation of your new Laplace object. Of course, you should then also use it when training with weight decay (so the posterior becomes $\mathbf{H}_0$ here).

Please be aware of some limitations here: our package was never designed to be a training framework. It merely ships the functionality for fitting LA to neural networks trained in Flux in a post-hoc fashion. I should also flag that my own research is in a different field, so I'm by no means an expert on LA and I am just brain-storming my thoughts here about your problem setup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Online Learning #136

Online Learning #136

caxelrud commented Nov 27, 2024

pat-alt commented Nov 28, 2024

caxelrud commented Nov 29, 2024 •

edited

Loading

caxelrud commented Dec 10, 2024 •

edited

Loading

pat-alt commented Dec 10, 2024

Online Learning #136

Online Learning #136

Comments

caxelrud commented Nov 27, 2024

pat-alt commented Nov 28, 2024

caxelrud commented Nov 29, 2024 • edited Loading

caxelrud commented Dec 10, 2024 • edited Loading

pat-alt commented Dec 10, 2024

caxelrud commented Nov 29, 2024 •

edited

Loading

caxelrud commented Dec 10, 2024 •

edited

Loading