Skip to content

Latest commit

 

History

History
46 lines (29 loc) · 3.2 KB

README.md

File metadata and controls

46 lines (29 loc) · 3.2 KB

Image-conditioned Gaussian Splatting Diffusion

3D Gaussian Splatting is a powerful method for learning 3D structures and enabling high-fidelity novel view synthesis.

To circumvent long optimization times and the dense accurately posed dataset requirements, in this project, Tyszkiewicz et al.’s point cloud diffusion model GECCO is extended to Gaussian Splatting point clouds. This allows generation of Gaussian Splatting scenes either conditionally on an image or unconditionally for a certain class.

Diffusion is the process of adding noise to samples from an unknown distribution with a fixed noise schedule that guarantees transformation of the original sample to a data point from $\mathcal{N}(0,\sigma_{\max} I)$. A neural network $p_\theta$ learns how to undo the noising process, which allows transforming a sample from $\mathcal{N}(0,\sigma_{\max} I)$ to a sample from the target distribution.

Diffusion

Method

During training, the Gaussian scene is noised based on the noise level t and projected onto a ConvNeXT-tiny-derived feature map. This enhanced point cloud is denoised with the Set Transformer. The loss is calculated by comparing the denoised scene against the ground truth scene and photometrically against a ground truth image. Method Overview

The denoising backbone is based on Lee et al.'s Set Transformer which reduces attention's quadratic complexity to one that is linear in the number of data points w.r.t. the number of Learned Inducers. Set Transformer

Conditional generation

From the different investigated methods, the Procrustes and SO(3) methods emerged as the most effective. Both methods perform diffusion on the Gaussian parameters in the Euclidean space, but adopt distinct strategies for handling the rotational parts of the Gaussian points. Procrustes learns a differentiable mapping from $3\times3$ matrices to rotation matrices and SO(3) models the rotations as samples drawn from a rotational distribution, which is the rotational equivalent of the Gaussian normal distribution.


Conditioning image for the diffusion process

Ground truth Gaussian scene

Diffused scene using Procrustes mapping

Generated scene using SO(3) diffusion

Unconditional generation