3D Gaussian Splatting is a powerful method for learning 3D structures and enabling high-fidelity novel view synthesis.
To circumvent long optimization times and the dense accurately posed dataset requirements, in this project, Tyszkiewicz et al.’s point cloud diffusion model GECCO is extended to Gaussian Splatting point clouds. This allows generation of Gaussian Splatting scenes either conditionally on an image or unconditionally for a certain class.
Diffusion is the process of adding noise to samples from an unknown distribution with a fixed noise schedule that guarantees transformation of the original sample to a data point from
During training, the Gaussian scene is noised based on the noise level t and projected onto a ConvNeXT-tiny-derived feature map. This enhanced point cloud is denoised with the Set Transformer. The loss is calculated by comparing the denoised scene against the ground truth scene and photometrically against a ground truth image.
The denoising backbone is based on Lee et al.'s Set Transformer which reduces attention's quadratic complexity to one that is linear in the number of data points w.r.t. the number of Learned Inducers.
From the different investigated methods, the Procrustes and SO(3) methods emerged as the most effective. Both methods perform diffusion on the Gaussian parameters in the Euclidean space, but adopt distinct strategies for handling the rotational parts of the Gaussian points. Procrustes learns a differentiable mapping from
Conditioning image for the diffusion process |
Ground truth Gaussian scene |
Diffused scene using Procrustes mapping |
Generated scene using SO(3) diffusion |