multimodal multiconditioning adaptor train an adaptor on a multimodal representation, e.g. https://github.com/autonomousvision/unimatch