Skip to content

wangkai930418/awesome-diffusion-categorized

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Awesome Diffusion Categorized

Contents

Accelerate

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
[ICLR 2024 Spotlight] [Diffusers 1] [Diffusers 2] [Project] [Code]

SDXL-Turbo: Adversarial Diffusion Distillation
[Website] [Diffusers 1] [Diffusers 2] [Project] [Code]

Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping
[Website] [Diffusers 1] [Diffusers 2] [Project] [Code]

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
[Website] [Diffusers] [Project] [Code]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
[Website] [Project] [Code]

DMD2: Improved Distribution Matching Distillation for Fast Image Synthesis
[NeurIPS 2024 Oral] [Project] [Code]

DMD1: One-step Diffusion with Distribution Matching Distillation
[CVPR 2024] [Project] [Code]

Consistency Models [ICML 2023](https://doi.org/10.48550/arXiv.2410.11081] [Diffusers] [Code]

SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
[CVPR 2024] [Project] [Code]

SwiftBrush V2: Make Your One-Step Diffusion Model Better Than Its Teacher
[ECCV 2024] [Project] [Code]

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
[CVPR 2024] [Project] [Code]

PCM : Phased Consistency Model
[NeurIPS 2024] [Project] [Code]

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
[NeurIPS 2024] [Project] [Code]

KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis
[NeurIPS 2024] [Project] [Code]

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
[Website] [Project] [Code]

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
[Website] [Project] [Code]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models [Website] [Project] [Code]

Adaptive Caching for Faster Video Generation with Diffusion Transformers
[Website] [Project] [Code]

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
[Website] [Project] [Code]

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
[Website] [Project] [Code]

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
[Website] [Project] [Code]

Reward Guided Latent Consistency Distillation
[Website] [Project] [Code]

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
[Website] [Project] [Code]

Relational Diffusion Distillation for Efficient Image Generation
[ACM MM 2024 (Oral)] [Code]

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
[CVPR 2024] [Code]

SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
[ECCV 2024] [Code]

Accelerating Image Generation with Sub-path Linear Approximation Model
[ECCV 2024] [Code]

Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models
[NeurIPS 2023] [Code]

Fast and Memory-Efficient Video Diffusion Using Streamlined Inference
[NeurIPS 2024] [Code]

A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models
[ICML 2024] [Code]

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation
[ICML 2024] [Code]

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
[ICLR 2024] [Code]

Accelerating Vision Diffusion Transformers with Skip Branches
[Website] [Code]

One Step Diffusion via Shortcut Models
[Website] [Code]

DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach
[Website] [Code]

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
[Website] [Code]

Stable Consistency Tuning: Understanding and Improving Consistency Models
[Website] [Code]

SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
[Website] [Code]

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
[Website] [Code]

Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation
[Website] [Code]

Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation
[Website] [Code]

Diffusion Models Are Innate One-Step Generators
[Website] [Code]

Distilling Diffusion Models into Conditional GANs
[ECCV 2024] [Project]

Cache Me if You Can: Accelerating Diffusion Models through Block Caching
[CVPR 2024] [Project]

Plug-and-Play Diffusion Distillation
[CVPR 2024] [Project]

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
[NeurIPS 2023] [Project]

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance
[Website] [Project]

NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
[Website] [Project]

Truncated Consistency Models
[Website] [Project]

Multi-student Diffusion Distillation for Better One-step Generators
[Website] [Project]

Effortless Efficiency: Low-Cost Pruning of Diffusion Models
[Website] [Project]

FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
[NeurIPS 2024]

One-Step Diffusion Distillation through Score Implicit Matching
[NeurIPS 2024]

Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation
[AAAI 2025]

Inference-Time Diffusion Model Distillation
[Website]

HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration
[Website]

Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models
[Website]

MLCM: Multistep Consistency Distillation of Latent Diffusion Model
[Website]

EM Distillation for One-step Diffusion Models
[Website]

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
[Website]

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
[Website]

Partially Conditioned Patch Parallelism for Accelerated Diffusion Model Inference
[Website]

Importance-based Token Merging for Diffusion Models
[Website]

Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation
[Website]

Accelerating Diffusion Models with One-to-Many Knowledge Distillation
[Website]

Accelerating Video Diffusion Models via Distribution Matching
[Website]

TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution
[Website]

DDIL: Improved Diffusion Distillation With Imitation Learning
[Website]

OSV: One Step is Enough for High-Quality Image to Video Generation
[Website]

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance
[Website]

Token Caching for Diffusion Transformer Acceleration
[Website]

DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization
[Website]

LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
[Website]

Flow Generator Matching
[Website]

Multistep Distillation of Diffusion Models via Moment Matching
[Website]

SFDDM: Single-fold Distillation for Diffusion models
[Website]

LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models
[Website]

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
[Website]

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
[Website]

SDXL-Lightning: Progressive Adversarial Diffusion Distillation
[Website]

Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training
[Website]

TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution
[Website]

Train-Free

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
[NeurIPS 2024] [Project] [Code]

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
[NeurIPS 2024] [Project] [Code]

DeepCache: Accelerating Diffusion Models for Free
[CVPR 2024] [Project] [Code]

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference
[NeurIPS 2024] [Code]

DiTFastAttn: Attention Compression for Diffusion Transformer Models
[NeurIPS 2024] [Code]

Structural Pruning for Diffusion Models
[NeurIPS 2023] [Code]

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration
[ICCV 2023] [Code]

Agent Attention: On the Integration of Softmax and Linear Attention
[ECCV 2024] [Code]

Token Merging for Fast Stable Diffusion
[CVPRW 2024] [Code]

FORA: Fast-Forward Caching in Diffusion Transformer Acceleration
[Website] [Code]

Real-Time Video Generation with Pyramid Attention Broadcast
[Website] [Code]

Accelerating Diffusion Transformers with Token-wise Feature Caching
[Website] [Code]

TGATE-V1: Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
[Website] [Code]

TGATE-V2: Faster Diffusion via Temporal Attention Decomposition
[Website] [Code]

SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers
[Website] [Code]

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
[CVPR 2024] [Project]

Cache Me if You Can: Accelerating Diffusion Models through Block Caching
[Website] [Project]

Token Fusion: Bridging the Gap between Token Pruning and Token Merging
[WACV 2024]

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
[Website]

PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future
[Website]

Δ-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
[Website]

Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step
[Website]

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences
[Website]

Fast constrained sampling in pre-trained diffusion models
[Website]

Image Restoration

Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model
[ICLR 2023 oral] [Project] [Code]

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
[CVPR 2024] [Project] [Code]

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
[CVPR 2024] [Project] [Code]

Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
[CVPR 2024] [Project] [Code]

From Posterior Sampling to Meaningful Diversity in Image Restoration
[ICLR 2024] [Project] [Code]

Generative Diffusion Prior for Unified Image Restoration and Enhancement
[CVPR 2023] [Project] [Code]

MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
[ECCV 2024] [Project] [Code]

Image Restoration with Mean-Reverting Stochastic Differential Equations
[ICML 2023] [Project] [Code]

PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
[NeurIPS 2024 Spotlight] [Project] [Code]

Denoising Diffusion Models for Plug-and-Play Image Restoration
[CVPR 2023 Workshop NTIRE] [Project] [Code]

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration
[Website] [Project] [Code]

Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing
[Website] [Project] [Code]

Solving Video Inverse Problems Using Image Diffusion Models
[Website] [Project] [Code]

Learning Efficient and Effective Trajectories for Differential Equation-based Image Restoration
[Website] [Project] [Code]

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
[Website] [Project] [Code]

FlowIE: Efficient Image Enhancement via Rectified Flow
[CVPR 2024 oral] [Code]

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
[NeurIPS 2023 (Spotlight)] [Code]

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
[ICML 2023 oral] [Code]

Diffusion Priors for Variational Likelihood Estimation and Image Denoising
[NeurIPS 2024 Spotlight] [Code]

Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance
[CVPR 2024] [Code]

DiffIR: Efficient Diffusion Model for Image Restoration
[ICCV 2023] [Code]

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
[ECCV 2024] [Code]

Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
[ECCV 2024] [Code]

DAVI: Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problem
[ECCV 2024] [Code]

Low-Light Image Enhancement with Wavelet-based Diffusion Models
[SIGGRAPH Asia 2023] [Code]

Residual Denoising Diffusion Models
[CVPR 2024] [Code]

Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
[CVPR 2024] [Code]

Deep Equilibrium Diffusion Restoration with Parallel Sampling
[CVPR 2024] [Code]

ReFIR: Grounding Large Restoration Models with Retrieval Augmentation
[NeurIPS 2024] [Code]

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
[NeurIPS 2024] [Code]

Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression
[Website] [Code]

Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems
[Website] [Code]

UniProcessor: A Text-induced Unified Low-level Image Processor
[Website] [Code]

Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models
[CVPR 2023 Workshop NTIRE] [Code]

Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement
[CVPR 2024 Workshop NTIRE] [Code]

PnP-Flow: Plug-and-Play Image Restoration with Flow Matching
[Website] [Code]

VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement
[Website] [Code]

Deep Data Consistency: a Fast and Robust Diffusion Model-based Solver for Inverse Problems
[Website] [Code]

Learning A Coarse-to-Fine Diffusion Transformer for Image Restoration
[Website] [Code]

Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling
[Website] [Code]

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models
[Website] [Code]

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior
[Website] [Code]

Frequency Compensated Diffusion Model for Real-scene Dehazing
[Website] [Code]

Efficient Image Deblurring Networks based on Diffusion Models
[Website] [Code]

Blind Image Restoration via Fast Diffusion Inversion
[Website] [Code]

DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models
[Website] [Code]

Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling
[Website] [Code]

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
[Website] [Code]

Unlimited-Size Diffusion Restoration
[Website] [Code]

VmambaIR: Visual State Space Model for Image Restoration
[Website] [Code]

Using diffusion model as constraint: Empower Image Restoration Network Training with Diffusion Model
[Website] [Code]

Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model
[Website] [Code]

TIP: Text-Driven Image Processing with Semantic and Restoration Instructions
[ECCV 2024] [Project]

Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models
[NeurIPS 2024] [Project]

GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration
[Website] [Project]

VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models
[Website] [Project]

Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model
[ICCV 2023]

Multiscale Structure Guided Diffusion for Image Deblurring
[ICCV 2023]

Boosting Image Restoration via Priors from Pre-trained Models
[CVPR 2024]

A Modular Conditional Diffusion Framework for Image Reconstruction
[Website]

Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model
[Website]

Particle-Filtering-based Latent Diffusion for Inverse Problems
[Website]

Bayesian Conditioned Diffusion Models for Inverse Problem
[Website]

ReCo-Diff: Explore Retinex-Based Condition Strategy in Diffusion Model for Low-Light Image Enhancement
[Website]

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
[Website]

Tell Me What You See: Text-Guided Real-World Image Denoising
[Website]

Zero-LED: Zero-Reference Lighting Estimation Diffusion Model for Low-Light Image Enhancement
[Website]

Prototype Clustered Diffusion Models for Versatile Inverse Problems
[Website]

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement
[Website]

Taming Generative Diffusion for Universal Blind Image Restoration
[Website]

Efficient Image Restoration through Low-Rank Adaptation and Stable Diffusion XL
[Website]

Empirical Bayesian image restoration by Langevin sampling with a denoising diffusion implicit prior
[Website]

Data-free Distillation with Degradation-prompt Diffusion for Multi-weather Image Restoration
[Website]

FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process
[Website]

Diffusion State-Guided Projected Gradient for Inverse Problems
[Website]

InstantIR: Blind Image Restoration with Instant Generative Reference
[Website]

Score-Based Variational Inference for Inverse Problems
[Website]

Towards Flexible and Efficient Diffusion Low Light Enhancer
[Website]

G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving
[Website]

AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations
[Website]

DiffMVR: Diffusion-based Automated Multi-Guidance Video Restoration
[Website]

Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion
[Website]

DIVD: Deblurring with Improved Video Diffusion Model
[Website]

Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
[Website]

Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization
[Website]

Are Conditional Latent Diffusion Models Effective for Image Restoration?
[Website]

Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration
[Website]

Colorization

ColorFlow: Retrieval-Augmented Image Sequence Colorization
[Website] [Project] [Code]

Control Color: Multimodal Diffusion-based Interactive Image Colorization
[Website] [Project] [Code]

Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior
[Website] [Project] [Code]

ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text
[Website] [Code]

Diffusing Colors: Image Colorization with Text Guided Diffusion
[SIGGRAPH Asia 2023] [Project]

Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements
[Website]

DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models
[Website]

Face Restoration

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
[Website] [Project] [Code]

OSDFace: One-Step Diffusion Model for Face Restoration
[Website] [Project] [Code]

ReF-LDM: A Latent Diffusion Model for Reference-based Face Image Restoration
[Website] [Project] [Code]

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention
[Website] [Project] [Code]

DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
[CVPR 2023] [Code]

PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance
[NeurIPS 2023] [Code]

DifFace: Blind Face Restoration with Diffused Error Contraction
[Website] [Code]

AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior
[Website] [Code]

RestorerID: Towards Tuning-Free Face Restoration with ID Preservation
[Website] [Code]

Towards Real-World Blind Face Restoration with Generative Diffusion Prior
[Website] [Code]

Towards Unsupervised Blind Face Restoration using Diffusion Prior
[Website] [Project]

DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration
[Website]

CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models
[Website]

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration
[Website]

Gaussian is All You Need: A Unified Framework for Solving Inverse Problems via Diffusion Posterior Sampling
[Website]

Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
[Website]

DR-BFR: Degradation Representation with Diffusion Models for Blind Face Restoration
[Website]

Storytelling

⭐⭐Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
[CVPR 2024] [Project] [Code]

⭐⭐Training-Free Consistent Text-to-Image Generation
[SIGGRAPH 2024] [Project] [Code]

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
[SIGGRAPH 2024] [Project] [Code]

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
[Website] [Project] [Code]

AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
[Website] [Project] [Code]

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
[Website] [Project] [Code]

StoryGPT-V: Large Language Models as Consistent Story Visualizers
[Website] [Project] [Code]

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
[Website] [Project] [Code]

TaleCrafter: Interactive Story Visualization with Multiple Characters
[Website] [Project] [Code]

Story-Adapter: A Training-free Iterative Framework for Long Story Visualization
[Website] [Project] [Code]

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
[Website] [Project] [Code]

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
[Website] [Project] [Code]

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
[ECCV 2024] [Code]

Make-A-Story: Visual Memory Conditioned Consistent Story Generation
[CVPR 2023] [Code]

StoryWeaver: A Unified World Model for Knowledge-Enhanced Story Character Customization
[AAAI 2025] [Code]

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation
[Website] [Code]

SEED-Story: Multimodal Long Story Generation with Large Language Model
[Website] [Code]

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
[Website] [Code]

Masked Generative Story Transformer with Character Guidance and Caption Augmentation
[Website] [Code]

StoryBench: A Multifaceted Benchmark for Continuous Story Visualization
[Website] [Code]

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
[Website] [Code]

DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
[Website] [Project]

Multi-Shot Character Consistency for Text-to-Video Generation
[Website] [Project]

MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising
[Website] [Project]

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis
[ICASSP 2024]

CogCartoon: Towards Practical Story Visualization
[Website]

Generating coherent comic with rich story using ChatGPT and Stable Diffusion
[Website]

Improved Visual Story Generation with Adaptive Context Modeling
[Website]

Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control
[Website]

Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models
[Website]

Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models
[Website]

ORACLE: Leveraging Mutual Information for Consistent Character Generation with LoRAs in Diffusion Models
[Website]

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
[Website]

StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration
[Website]

Improving Multi-Subject Consistency in Open-Domain Image Generation with Isolation and Reposition Attention
[Website]

Try On

TryOnDiffusion: A Tale of Two UNets
[CVPR 2023] [Website] [Project] [Official Code] [Unofficial Code]

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
[CVPR 2024] [Project] [Code]

VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
[Website] [Project] [Code]

IMAGDressing-v1: Customizable Virtual Dressing
[Website] [Project] [Code]

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person
[Website] [Project] [Code]

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
[Website] [Project] [Code]

ViViD: Video Virtual Try-on using Diffusion Models
[Website] [Project] [Code]

FashionComposer: Compositional Fashion Image Generation
[Website] [Project] [Code]

GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting
[Website] [Project] [Code]

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images
[Website] [Project] [Code]

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
[Website] [Project] [Code]

PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns
[Website] [Project] [Code]

StableGarment: Garment-Centric Generation via Stable Diffusion
[Website] [Project] [Code]

Improving Diffusion Models for Virtual Try-on
[Website] [Project] [Code]

D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On
[ECCV 2024] [Code]

Improving Virtual Try-On with Garment-focused Diffusion Models
[ECCV 2024] [Code]

Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
[CVPR 2024] [Code]

Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow
[ACM MM 2023] [Code]

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On
[ACM MM 2023] [Code]

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
[Website] [Code]

CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Model
[Website] [Code]

Learning Flow Fields in Attention for Controllable Person Image Generation
[Website] [Code]

DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling
[Website] [Code]

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
[Website] [Code]

Consistent Human Image and Video Generation with Spatially Conditioned Diffusion
[Website] [Code]

MV-VTON: Multi-View Virtual Try-On with Diffusion Models
[Website] [Code]

PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
[Website] [Code]

M&M VTO: Multi-Garment Virtual Try-On and Editing
[CVPR 2024 Highlight] [Project]

WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
[ECCV 2024] [Project]

Fashion-VDM: Video Diffusion Model for Virtual Try-On
[SIGGRAPH Asia 2024] [Project]

Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos
[Website] [Project]

Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild
[Website] [Project]

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
[Website] [Project]

Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
[Website] [Project]

Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
[Website] [Project]

VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers
[Website] [Project]

AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario
[Website] [Project]

Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
[Website] [Project]

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on
[IJCAI 2024]

GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon
[Website]

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on
[Website]

Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles
[Website]

Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models
[Website]

Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models
[Website]

ACDG-VTON: Accurate and Contained Diffusion Generation for Virtual Try-On
[Website]

ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model
[Website]

AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion
[Website]

DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing
[Website]

TED-VITON: Transformer-Empowered Diffusion Models for Virtual Try-On
[Website]

Controllable Human Image Generation with Personalized Multi-Garments
[Website]

RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
[Website]

SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
[Website]

IGR: Improving Diffusion Model for Garment Restoration from Person Image
[Website]

DiffusionTrend: A Minimalist Approach to Virtual Fashion Try-On
[Website]

DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
[Website]

Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models
[Website]

Drag Edit

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models
[ICLR 2024] [Website] [Project] [Code]

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
[SIGGRAPH 2023] [Project] [Code]

Readout Guidance: Learning Control from Diffusion Features
[CVPR 2024 Highlight] [Project] [Code]

FreeDrag: Feature Dragging for Reliable Point-based Image Editing
[CVPR 2024] [Project] [Code]

DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
[CVPR 2024] [Project] [Code]

InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
[Website] [Project] [Code]

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
[Website] [Project] [Code]

Repositioning the Subject within Image
[Website] [Project] [Code]

Drag-A-Video: Non-rigid Video Editing with Point-based Interaction
[Website] [Project] [Code]

ObjCtrl-2.5D: Training-free Object Control with Camera Poses
[Website] [Project] [Code]

DragAnything: Motion Control for Anything using Entity Representation
[Website] [Project] [Code]

InstantDrag: Improving Interactivity in Drag-based Image Editing
[Website] [Project] [Code]

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
[CVPR 2024] [Code]

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
[CVPR 2024] [Code]

DragVideo: Interactive Drag-style Video Editing
[ECCV 2024] [Code]

RotationDrag: Point-based Image Editing with Rotated Diffusion Features
[Website] [Code]

TrackGo: A Flexible and Efficient Method for Controllable Video Generation
[Website] [Project]

DragText: Rethinking Text Embedding in Point-based Image Editing
[Website] [Project]

OmniDrag: Enabling Motion Control for Omnidirectional Image-to-Video Generation
[Website] [Project]

FastDrag: Manipulate Anything in One Step
[Website] [Project]

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
[Website] [Project]

StableDrag: Stable Dragging for Point-based Image Editing
[Website] [Project]

DiffUHaul: A Training-Free Method for Object Dragging in Images
[Website] [Project]

RegionDrag: Fast Region-Based Image Editing with Diffusion Models
[Website]

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators
[Website]

Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing
[Website]

AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing
[Website]

Diffusion Models Inversion

⭐⭐⭐Null-text Inversion for Editing Real Images using Guided Diffusion Models
[CVPR 2023] [Website] [Project] [Code]

⭐⭐Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code
[ICLR 2024] [Website] [Project] [Code]

Inversion-Based Creativity Transfer with Diffusion Models
[CVPR 2023] [Website] [Code]

EDICT: Exact Diffusion Inversion via Coupled Transformations
[CVPR 2023] [Website] [Code]

Improving Negative-Prompt Inversion via Proximal Guidance
[Website] [Code]

An Edit Friendly DDPM Noise Space: Inversion and Manipulations
[CVPR 2024] [Project] [Code] [Demo]

Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing
[NeurIPS 2023] [Website] [Code]

Inversion-Free Image Editing with Natural Language
[CVPR 2024] [Project] [Code]

LEDITS++: Limitless Image Editing using Text-to-Image Models
[CVPR 2024] [Project] [Code]

Noise Map Guidance: Inversion with Spatial Context for Real Image Editing
[ICLR 2024] [Website] [Code]

ReNoise: Real Image Inversion Through Iterative Noising
[ECCV 2024] [Project] [Code]

IterInv: Iterative Inversion for Pixel-Level T2I Models
[NeurIPS-W 2023] [Openreview] [NeuripsW] [Website] [Code]

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models
[Website] [Project] [Code]

Object-aware Inversion and Reassembly for Image Editing
[Website] [Project] [Code]

A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance
[ICCV 2023] [Code]

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
[ECCV 2024] [Code]

LocInv: Localization-aware Inversion for Text-Guided Image Editing
[CVPR 2024 AI4CC workshop] [Code]

Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling
[IJCAI 2024] [Code]

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
[Website] [Code]

Generating Non-Stationary Textures using Self-Rectification
[Website] [Code]

Exact Diffusion Inversion via Bi-directional Integration Approximation
[Website] [Code]

IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
[Website] [Code]

Fixed-point Inversion for Text-to-image diffusion models
[Website] [Code]

Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
[Website] [Code]

Effective Real Image Editing with Accelerated Iterative Diffusion Inversion
[ICCV 2023 Oral] [Website]

BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models
[NeurIPS 2024]

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing
[NeurIPS 2024]

BARET : Balanced Attention based Real image Editing driven by Target-text Inversion
[WACV 2024]

Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing
[ICASSP 2024]

Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing
[Website]

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
[Website]

Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models
[Website]

Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models
[Website]

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
[Website]

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models
[Website]

KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing
[Website]

Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
[Website]

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance
[Website]

Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
[Website]

Text Guided Image Editing

⭐⭐⭐Prompt-to-Prompt Image Editing with Cross Attention Control
[ICLR 2023] [Website] [Project] [Code] [Replicate Demo]

⭐⭐⭐Zero-shot Image-to-Image Translation
[SIGGRAPH 2023] [Project] [Code] [Replicate Demo] [Diffusers Doc] [Diffusers Code]

⭐⭐InstructPix2Pix: Learning to Follow Image Editing Instructions
[CVPR 2023 (Highlight)] [Website] [Project] [Diffusers Doc] [Diffusers Code] [Official Code] [Dataset]

⭐⭐Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
[CVPR 2023] [Website] [Project] [Code] [Dataset] [Replicate Demo] [Demo]

DiffEdit: Diffusion-based semantic image editing with mask guidance
[ICLR 2023] [Website] [Unofficial Code] [Diffusers Doc] [Diffusers Code]

Imagic: Text-Based Real Image Editing with Diffusion Models
[CVPR 2023] [Website] [Project] [Diffusers]

Inpaint Anything: Segment Anything Meets Image Inpainting
[Website] [Code 1] [Code 2]

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
[ICCV 2023] [Website] [Project] [Code] [Demo]

Collaborative Score Distillation for Consistent Visual Synthesis
[NeurIPS 2023] [Website] [Project] [Code]

Visual Instruction Inversion: Image Editing via Visual Prompting
[NeurIPS 2023] [Website] [Project] [Code]

Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models
[NeurIPS 2023] [Website] [Code]

Localizing Object-level Shape Variations with Text-to-Image Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance
[Website] [Code1] [Code2] [Diffusers Code]

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models
[Website] [Project] [Code] [Demo]

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
[CVPR 2024] [Project] [Code]

Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
[CVPR 2024] [Project] [Code]

Text-Driven Image Editing via Learnable Regions
[CVPR 2024] [Project] [Code]

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators
[ICLR 2024] [Project] [Code]

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models
[SIGGRAPH Asia 2024] [Project] [Code]

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps
[NeurIPS 2024] [Project] [Code]

Zero-shot Image Editing with Reference Imitation
[Website] [Project] [Code]

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
[Website] [Project] [Code]

MultiBooth: Towards Generating All Your Concepts in an Image from Text
[Website] [Project] [Code]

Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting
[Website] [Project] [Code]

StyleBooth: Image Style Editing with Multimodal Instruction
[Website] [Project] [Code]

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
[Website] [Project] [Code]

EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods
[Website] [Project] [Code]

InsightEdit: Towards Better Instruction Following for Image Editing
[Website] [Project] [Code]

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions
[Website] [Project] [Code]

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path
[Website] [Project] [Code]

HIVE: Harnessing Human Feedback for Instructional Visual Editing
[Website] [Project] [Code]

FaceStudio: Put Your Face Everywhere in Seconds
[Website] [Project] [Code]

Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach
[Website] [Project] [Code]

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
[Website] [Project] [Code]

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction
[Website] [Project] [Code]

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance
[Website] [Project] [Code]

LIME: Localized Image Editing via Attention Regularization in Diffusion Models
[Website] [Project] [Code]

MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and Beyond
[Website] [Project] [Code]

MagicQuill: An Intelligent Interactive Image Editing System
[Website] [Project] [Code]

Scaling Concept With Text-Guided Diffusion Models
[Website] [Project] [Code]

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
[Website] [Project] [Code]

FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
[Website] [Project] [Code]

FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning
[Website] [Project] [Code]

Steering Rectified Flow Models in the Vector Field for Controlled Image Generation
[Website] [Project] [Code]

Delta Denoising Score
[Website] [Project] [Code]

InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences
[Website] [Project] [Code]

UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image
[SIGGRAPH 2023] [Code]

Learning to Follow Object-Centric Image Editing Instructions Faithfully
[EMNLP 2023] [Code]

GroupDiff: Diffusion-based Group Portrait Editing
[ECCV 2024] [Code]

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
[CVPR 2024] [Code]

ZONE: Zero-Shot Instruction-Guided Local Editing
[CVPR 2024] [Code]

Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
[CVPR 2024] [Code]

DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
[ECCV 2024] [Code]

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
[ECCV 2024] [Code]

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing
[ECCV 2024] [Code]

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
[AAAI 2024] [Code]

FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference
[AAAI 2024] [Code]

Face Aging via Diffusion-based Editing
[BMVC 2023] [Code]

Uniform Attention Maps: Boosting Image Fidelity in Reconstruction and Editing
[Website] [Code]

FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing
[Website] [Code]

Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing
[Website] [Code]

PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
[Website] [Code]

DiT4Edit: Diffusion Transformer for Image Editing
[Website] [Code]

Move and Act: Enhanced Object Manipulation and Background Integrity for Image Editing
[Website] [Code]

EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
[Website] [Code]

ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing
[Website] [Code]

Differential Diffusion: Giving Each Pixel Its Strength
[Website] [Code]

Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing
[Website] [Code]

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
[Website] [Code]

Region-Aware Diffusion for Zero-shot Text-driven Image Editing
[Website] [Code]

Forgedit: Text Guided Image Editing via Learning and Forgetting
[Website] [Code]

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing
[Website] [Code]

An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control
[Website] [Code]

FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
[Website] [Code]

Unified Diffusion-Based Rigid and Non-Rigid Editing with Text and Image Guidance
[Website] [Code]

SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing
[Website] [Code]

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
[Website] [Code]

PromptFix: You Prompt and We Fix the Photo
[Website] [Code]

FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
[Website] [Code]

Conditional Score Guidance for Text-Driven Image-to-Image Translation
[NeurIPS 2023] [Website]

Emu Edit: Precise Image Editing via Recognition and Generation Tasks
[CVPR 2024] [Project]

ByteEdit: Boost, Comply and Accelerate Generative Image Editing
[ECCV 2024] [Project]

Watch Your Steps: Local Image and Scene Editing by Text Instructions
[ECCV 2024] [Project]

TurboEdit: Instant text-based image editing
[ECCV 2024] [Project]

Novel Object Synthesis via Adaptive Text-Image Harmony
[NeurIPS 2024] [Project]

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
[Website] [Project]

HeadRouter: A Training-free Image Editing Framework for MM-DiTs by Adaptively Routing Attention Heads
[Website] [Project]

MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models
[Website] [Project]

Instruction-based Image Manipulation by Watching How Things Move
[Website] [Project]

BrushEdit: All-In-One Image Inpainting and Editing
[Website] [Project]

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
[Website] [Project]

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
[Website] [Project]

SeedEdit: Align Image Re-Generation to Image Editing
[Website] [Project]

Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection
[Website] [Project]

Generative Image Layer Decomposition with Visual Effects
[Website] [Project]

Editable Image Elements for Controllable Synthesis
[Website] [Project]

SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing
[Website] [Project]

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
[Website] [Project]

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation
[Website] [Project]

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
[Website] [Project]

GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models
[Website] [Project]

MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers
[Website] [Project]

FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing
[Website] [Project]

GeoDiffuser: Geometry-Based Image Editing with Diffusion Models
[Website] [Project]

SOEDiff: Efficient Distillation for Small Object Editing
[Website] [Project]

Click2Mask: Local Editing with Dynamic Mask Generation
[Website] [Project]

Stable Flow: Vital Layers for Training-Free Image Editing
[Website] [Project]

Iterative Multi-granular Image Editing using Diffusion Models
[WACV 2024]

Text-to-image Editing by Image Information Removal
[WACV 2024]

TexSliders: Diffusion-Based Texture Editing in CLIP Space
[SIGGRAPH 2024]

Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models
[CVPR 2023 AI4CC Workshop]

Learning Feature-Preserving Portrait Editing from Generated Pairs
[Website]

EmoEdit: Evoking Emotions through Image Manipulation
[Website]

DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images
[Website]

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models
[Website]

iEdit: Localised Text-guided Image Editing with Weak Supervision
[Website]

User-friendly Image Editing with Minimal Text Input: Leveraging Captioning and Injection Techniques
[Website]

PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing
[Website]

PRedItOR: Text Guided Image Editing with Diffusion Prior
[Website]

FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing
[Website]

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing
[Website]

Image Translation as Diffusion Visual Programmers
[Website]

Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing
[Website]

LoMOE: Localized Multi-Object Editing via Multi-Diffusion
[Website]

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
[Website]

DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation
[Website]

InstructGIE: Towards Generalizable Image Editing
[Website]

LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing
[Website]

Uncovering the Text Embedding in Text-to-Image Diffusion Models
[Website]

Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer
[Website]

Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion
[Website]

Text Guided Image Editing with Automatic Concept Locating and Forgetting
[Website]

The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP
[Website]

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing
[Website]

Achieving Complex Image Edits via Function Aggregation with Diffusion Models
[Website]

Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing
[Website]

InverseMeetInsert: Robust Real Image Editing via Geometric Accumulation Inversion in Guided Diffusion Models
[Website]

PixLens: A Novel Framework for Disentangled Evaluation in Diffusion-Based Image Editing with Object Detection + SAM
[Website]

Augmentation-Driven Metric for Balancing Preservation and Modification in TextGuided Image Editing
[Website]

Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing
[Website]

ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing
[Website]

ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models
[Website]

ColorEdit: Training-free Image-Guided Color editing with diffusion model
[Website]

GalaxyEdit: Large-Scale Image Editing Dataset with Enhanced Diffusion Adapter
[Website]

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
[Website]

Pathways on the Image Manifold: Image Editing via Video Generation
[Website]

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair
[Website]

Action-based image editing guided by human instructions
[Website]

Addressing Attribute Leakages in Diffusion-based Image Editing without Training
[Website]

Prompt Augmentation for Self-supervised Text-guided Image Manipulation
[Website]

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
[Website]

Diffusion-Based Conditional Image Editing through Optimized Inference with Guidance
[Website]

Continual Learning

RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models
[CVPR 2023] [Website] [Project] [Code]

Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
[ECCV 2024 Oral] [Code]

How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?
[NeurIPS 2024] [Code]

CLoG: Benchmarking Continual Learning of Image Generation Models
[Website] [Code]

Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
[Website] [Code]

Continual Learning of Diffusion Models with Generative Distillation
[Website] [Code]

Prompt-Based Exemplar Super-Compression and Regeneration for Class-Incremental Learning
[Website] [Code]

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA
[TMLR] [Project]

Assessing Open-world Forgetting in Generative Image Model Customization
[Website] [Project]

Class-Incremental Learning using Diffusion Model for Distillation and Replay
[ICCV 2023 VCL workshop best paper]

Create Your World: Lifelong Text-to-Image Diffusion
[Website]

Low-Rank Continual Personalization of Diffusion Models
[Website]

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
[Website]

Online Continual Learning of Video Diffusion Models From a Single Video Stream
[Website]

Exploring Continual Learning of Diffusion Models
[Website]

DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency
[Website]

DiffusePast: Diffusion-based Generative Replay for Class Incremental Semantic Segmentation
[Website]

Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters
[Website]

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning
[Website]

MuseumMaker: Continual Style Customization without Catastrophic Forgetting
[Website]

Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion
[Website]

Remove Concept

Ablating Concepts in Text-to-Image Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Erasing Concepts from Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Paint by Inpaint: Learning to Add Image Objects by Removing Them First
[Website] [Project] [Code]

One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
[Website] [Project] [Code]

Editing Massive Concepts in Text-to-Image Diffusion Models
[Website] [Project] [Code]

Memories of Forgotten Concepts
[Website] [Project] [Code]

STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models
[Website] [Project] [Code]

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models
[ICML 2023 workshop] [Code]

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
[ECCV 2024] [Code]

Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
[ECCV 2024] [Code]

Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation
[NeurIPS 2024] [Code]

Unveiling Concept Attribution in Diffusion Models
[Website] [Code]

TraSCE: Trajectory Steering for Concept Erasure
[Website] [Code]

Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
[Website] [Code]

ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion
[Website] [Code]

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models
[Website] [Code]

Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
[Website] [Code]

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
[Website] [Code]

Selective Amnesia: A Continual Learning Approach to Forgetting in Deep Generative Models
[Website] [Code]

Add-SD: Rational Generation without Manual Reference
[Website] [Code]

RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining
[Website] [Project]

MACE: Mass Concept Erasure in Diffusion Models
[CVPR 2024]

Continuous Concepts Removal in Text-to-image Diffusion Models
[Website]

Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models
[Website]

Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
[Website]

Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
[Website]

Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Model
[Website]

Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning
[Website]

Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models
[Website]

Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
[Website]

All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models
[Website]

EraseDiff: Erasing Data Influence in Diffusion Models
[Website]

UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models
[Website]

Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts
[Website]

R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
[Website]

Pruning for Robust Concept Erasing in Diffusion Models
[Website]

Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
[Website]

Unlearning Concepts from Text-to-Video Diffusion Models
[Website]

EIUP: A Training-Free Approach to Erase Non-Compliant Concepts Conditioned on Implicit Unsafe Prompts
[Website]

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
[Website]

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?
[Website]

Model Integrity when Unlearning with T2I Diffusion Models
[Website]

Learning to Forget using Hypernetworks
[Website]

Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
[Website]

New Concept Learning

⭐⭐⭐DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
[CVPR 2023 Honorable Mention] [Website] [Project] [Official Dataset] [Unofficial Code] [Diffusers Doc] [Diffusers Code]

⭐⭐⭐An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
[ICLR 2023 top-25%] [Website] [Diffusers Doc] [Diffusers Code] [Code]

⭐⭐Custom Diffusion: Multi-Concept Customization of Text-to-Image Diffusion
[CVPR 2023] [Website] [Project] [Diffusers Doc] [Diffusers Code] [Code]

⭐⭐ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
[ECCV 2024] [Project] [Code]

⭐⭐ReVersion: Diffusion-Based Relation Inversion from Images
[Website] [Project] [Code]

SINE: SINgle Image Editing with Text-to-Image Diffusion Models
[CVPR 2023] [Website] [Project] [Code]

Break-A-Scene: Extracting Multiple Concepts from a Single Image
[SIGGRAPH Asia 2023] [Project] [Code]

Concept Decomposition for Visual Exploration and Inspiration
[SIGGRAPH Asia 2023] [Project] [Code]

Cones: Concept Neurons in Diffusion Models for Customized Generation
[ICML 2023 Oral] [ICML 2023 Oral] [Website] [Code]

BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
[NeurIPS 2023] [Website] [Project] [Code]

Inserting Anybody in Diffusion Models via Celeb Basis
[NeurIPS 2023] [Website] [Project] [Code]

Controlling Text-to-Image Diffusion by Orthogonal Finetuning
[NeurIPS 2023] [Website] [Project] [Code]

Photoswap: Personalized Subject Swapping in Images
[NeurIPS 2023] [Website] [Project] [Code]

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
[NeurIPS 2023] [Website] [Project] [Code]

ITI-GEN: Inclusive Text-to-Image Generation
[ICCV 2023 Oral] [Website] [Project] [Code]

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models
[ICCV 2023] [Website] [Project] [Code]

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation
[ICCV 2023 Oral] [Website] [Code]

A Neural Space-Time Representation for Text-to-Image Personalization
[SIGGRAPH Asia 2023] [Project] [Code]

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models
[SIGGRAPH 2023] [Project] [Code]

Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation
[NeurIPS 2023] [Website] [Code]

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
[ECCV 2024] [Project] [Code]

Face2Diffusion for Fast and Editable Face Personalization
[CVPR 2024] [Project] [Code]

Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models
[CVPR 2024] [Project] [Code]

CapHuman: Capture Your Moments in Parallel Universes
[CVPR 2024] [Project] [Code]

Style Aligned Image Generation via Shared Attention
[CVPR 2024] [Project] [Code]

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
[CVPR 2024] [Project] [Code]

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
[CVPR 2024] [Project] [Code]

Material Palette: Extraction of Materials from a Single Image
[CVPR 2024] [Project] [Code]

Learning Continuous 3D Words for Text-to-Image Generation
[CVPR 2024] [Project] [Code]

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models
[AAAI 2024] [Project] [Code]

Direct Consistency Optimization for Compositional Text-to-Image Personalization
[NeurIPS 2024] [Project] [Code]

The Hidden Language of Diffusion Models
[ICLR 2024] [Project] [Code]

ZeST: Zero-Shot Material Transfer from a Single Image
[ECCV 2024] [Project] [Code]

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
[Website] [Project] [Code]

MagicFace: Training-free Universal-Style Human Image Customized Synthesis
[Website] [Project] [Code]

LCM-Lookahead for Encoder-based Text-to-Image Personalization
[Website] [Project] [Code]

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM
[Website] [Project] [Code]

AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation
[Website] [Project] [Code]

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
[Website] [Project] [Code]

ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
[Website] [Project] [Code]

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation
[Website] [Project] [Code]

Customizing Text-to-Image Models with a Single Image Pair
[Website] [Project] [Code]

DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
[Website] [Project] [Code]

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
[Website] [Project] [Code]

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning
[Website] [Project] [Code]

CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models
[Website] [Project] [Code]

Customizing Text-to-Image Diffusion with Camera Viewpoint Control
[Website] [Project] [Code]

Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization
[Website] [Project] [Code]

StyleDrop: Text-to-Image Generation in Any Style
[Website] [Project] [Code]

Personalized Representation from Personalized Generation
[Website] [Project] [Code]

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
[Website] [Project] [Code]

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
[Website] [Project] [Code]

Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[Website] [Project] [Code]

Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion
[Website] [Project] [Code]

MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
[Website] [Project] [Code]

MagicNaming: Consistent Identity Generation by Finding a "Name Space" in T2I Diffusion Models
[Website] [Project] [Code]

DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning
[Website] [Project] [Code]

SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing
[Website] [Project] [Code]

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
[Website] [Project] [Code]

When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation
[Website] [Project] [Code]

InstantID: Zero-shot Identity-Preserving Generation in Seconds
[Website] [Project] [Code]

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
[Website] [Project] [Code]

Dense-Face: Personalized Face Generation Model via Dense Annotation Prediction
[Website] [Project] [Code]

CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization
[Website] [Project] [Code]

DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
[Website] [Project] [Code]

λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
[Website] [Project] [Code]

Viewpoint Textual Inversion: Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
[Website] [Project] [Code]

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
[Website] [Project] [Code]

StableIdentity: Inserting Anybody into Anywhere at First Sight
[Website] [Project] [Code]

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
[Website] [Project] [Code]

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder
[Website] [Project] [Code]

EZIGen: Enhancing zero-shot subject-driven image generation with precise subject encoding and decoupled guidance
[Website] [Project] [Code]

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
[Website] [Project] [Code]

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
[Website] [Project] [Code]

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
[Website] [Project] [Code]

CSGO: Content-Style Composition in Text-to-Image Generation
[Website] [Project] [Code]

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models
[NeurIPS 2024] [Code]

Customized Generation Reimagined: Fidelity and Editability Harmonized
[ECCV 2024] [Code]

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
[ECCV 2024] [Code]

High-fidelity Person-centric Subject-to-Image Synthesis
[CVPR 2024] [Code]

ProSpect: Expanded Conditioning for the Personalization of Attribute-aware Image Generation
[SIGGRAPH Asia 2023] [Code]

Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier
[WACV 2025] [Code]

Multiresolution Textual Inversion
[NeurIPS 2022 workshop] [Code]

Compositional Inversion for Stable Diffusion Models
[AAAI 2024] [Code]

Decoupled Textual Embeddings for Customized Image Generation
[AAAI 2024] [Code]

DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning
[NeurIPS 2024] [Code]

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
[Website] [Code]

Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
[Website] [Code]

Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis
[Website] [Code]

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance
[Website] [Code]

PuLID: Pure and Lightning ID Customization via Contrastive Alignment
[Website] [Code]

Cross Initialization for Personalized Text-to-Image Generation
[Website] [Code]

Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
[Website] [Code]

SVDiff: Compact Parameter Space for Diffusion Fine-Tuning
[Website] [Code]

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation
[Website] [Code]

AerialBooth: Mutual Information Guidance for Text Controlled Aerial View Synthesis from a Single Image
[Website] [Code]

A Closer Look at Parameter-Efficient Tuning in Diffusion Models
[Website] [Code]

FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization
[Website] [Code]

Controllable Textual Inversion for Personalized Text-to-Image Generation
[Website] [Code]

Cross-domain Compositing with Pretrained Diffusion Models
[Website] [Code]

Concept-centric Personalization with Large-scale Diffusion Priors
[Website] [Code]

Customization Assistant for Text-to-image Generation
[Website] [Code]

Cross Initialization for Personalized Text-to-Image Generation
[Website] [Code]

Cones 2: Customizable Image Synthesis with Multiple Subjects
[Website] [Code]

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
[Website] [Code]

AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization
[Website] [Code]

PersonaMagic: Stage-Regulated High-Fidelity Face Customization with Tandem Equilibrium
[Website] [Code]

CusConcept: Customized Visual Concept Decomposition with Diffusion Models
[Website] [Code]

HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
[ECCV 2024] [Project]

Language-Informed Visual Concept Learning
[ICLR 2024] [Project]

Key-Locked Rank One Editing for Text-to-Image Personalization
[SIGGRAPH 2023] [Project]

Diffusion in Style
[ICCV 2023] [Project]

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
[CVPR 2024] [Project]

RealCustom++: Representing Images as Real-Word for Real-Time Customization
[Website] [Project]

Personalized Residuals for Concept-Driven Text-to-Image Generation
[CVPR 2024] [Project]

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
[ECCV 2024] [Project]

Diffusion Self-Distillation for Zero-Shot Customized Image Generation
[Website] [Project]

RelationBooth: Towards Relation-Aware Customized Object Generation
[Website] [Project]

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
[Website] [Project]

InstructBooth: Instruction-following Personalized Text-to-Image Generation
[Website] [Project]

AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
[Website] [Project]

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation
[Website] [Project]

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
[Website] [Project]

PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
[Website] [Project]

Subject-driven Text-to-Image Generation via Apprenticeship Learning
[Website] [Project]

Orthogonal Adaptation for Modular Customization of Diffusion Models
[Website] [Project]

Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation
[Website] [Project]

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
[Website] [Project]

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner
[Website] [Project]

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
[Website] [Project]

$P+$: Extended Textual Conditioning in Text-to-Image Generation
[Website] [Project]

PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
[Website] [Project]

InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
[Website] [Project]

Total Selfie: Generating Full-Body Selfies
[Website] [Project]

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation
[Website] [Project]

DreamTuner: Single Image is Enough for Subject-Driven Generation
[Website] [Project]

SerialGen: Personalized Image Generation by First Standardization Then Personalization
[Website] [Project]

PALP: Prompt Aligned Personalization of Text-to-Image Models
[Website] [Project]

TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
[CVPR 2024] [Project]

Visual Style Prompting with Swapping Self-Attention
[Website] [Project]

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
[Website] [Project]

Non-confusing Generation of Customized Concepts in Diffusion Models
[Website] [Project]

HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
[Website] [Project]

Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models
[NeurIPS 2024]

ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image
[ECCV 2024]

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
[CVPR 2024]

JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
[CVPR 2024]

DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
[AAAI 2024]

FreeTuner: Any Subject in Any Style with Training-free Diffusion
[Website]

Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling Augmentation Framework
[Website]

InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
[Website]

DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation
[Website]

Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models
[Website]

Gradient-Free Textual Inversion
[Website]

Identity Encoder for Personalized Diffusion
[Website]

Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation
[Website]

ELODIN: Naming Concepts in Embedding Spaces
[Website]

Generate Anything Anywhere in Any Scene
[Website]

Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model
[Website]

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face
[Website]

MagiCapture: High-Resolution Multi-Concept Portrait Customization
[Website]

A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization
[Website]

DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics
[Website]

An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis
[Website]

Lego: Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
[Website]

Memory-Efficient Personalization using Quantized Diffusion Model
[Website]

BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
[Website]

Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization
[Website]

Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
[Website]

SeFi-IDE: Semantic-Fidelity Identity Embedding for Personalized Diffusion-Based Generation
[Website]

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
[Website]

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
[Website]

MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration
[Website]

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
[Website]

OneActor: Consistent Character Generation via Cluster-Conditioned Guidance
[Website]

StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models
[Website]

Exploring Diffusion Models' Corruption Stage in Few-Shot Fine-tuning and Mitigating with Bayesian Neural Networks
[Website]

Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter
[Website]

PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
[Website]

AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models
[Website]

Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation
[Website]

PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
[Website]

MagicID: Flexible ID Fidelity Generation System
[Website]

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization
[Website]

ArtiFade: Learning to Generate High-quality Subject from Blemished Images
[Website]

CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
[Website]

Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis
[Website]

Event-Customized Image Generation
[Website]

LEARNING TO CUSTOMIZE TEXT-TO-IMAGE DIFFUSION IN DIVERSE CONTEXT
[Website]

HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects
[Website]

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
[Website]

Foundation Cures Personalization: Recovering Facial Personalized Models' Prompt Consistency
[Website]

Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
[Website]

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
[Website]

RealisID: Scale-Robust and Fine-Controllable Identity Customization via Local and Global Complementation
[Website]

T2I Diffusion Model augmentation

⭐⭐⭐Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
[SIGGRAPH 2023] [Project] [Official Code] [Diffusers Code] [Diffusers doc] [Replicate Demo]

SEGA: Instructing Diffusion using Semantic Dimensions
[NeurIPS 2023] [Website] [Code] [Diffusers Code] [Diffusers Doc]

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
[ICCV 2023] [Website] [Project] [Code Official] [Diffusers Doc] [Diffusers Code]

Expressive Text-to-Image Generation with Rich Text
[ICCV 2023] [Website] [Project] [Code] [Demo]

Editing Implicit Assumptions in Text-to-Image Diffusion Models
[ICCV 2023] [Website] [Project] [Code] [Demo]

ElasticDiffusion: Training-free Arbitrary Size Image Generation
[CVPR 2024] [Project] [Code] [Demo]

MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Discriminative Class Tokens for Text-to-Image Diffusion Models
[ICCV 2023] [Website] [Project] [Code]

Compositional Visual Generation with Composable Diffusion Models
[ECCV 2022] [Website] [Project] [Code]

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
[ICCV 2023] [Project] [Code] [Blog]

Diffusion Self-Guidance for Controllable Image Generation
[NeurIPS 2023] [Website] [Project] [Code]

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
[NeurIPS 2023] [Website] [Code]

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
[NeurIPS 2023] [Website] [Code]

Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment
[NeurIPS 2023] [Website] [Code]

DemoFusion: Democratising High-Resolution Image Generation With No $$$
[CVPR 2024] [Project] [Code]

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
[CVPR 2024] [Project] [Code]

Training Diffusion Models with Reinforcement Learning
[ICLR 2024] [Project] [Code]

Divide & Bind Your Attention for Improved Generative Semantic Nursing
[BMVC 2023 Oral] [Project] [Code]

Make It Count: Text-to-Image Generation with an Accurate Number of Objects
[Website] [Project] [Code]

OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
[Website] [Project] [Code]

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
[Website] [Project] [Code]

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
[Website] [Project] [Code]

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
[Website] [Project] [Code]

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts
[Website] [Project] [Code]

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
[Website] [Project] [Code]

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
[Website] [Project] [Code]

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
[Website] [Project] [Code]

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
[Website] [Project] [Code]

Real-World Image Variation by Aligning Diffusion Inversion Chain
[Website] [Project] [Code]

FreeU: Free Lunch in Diffusion U-Net
[Website] [Project] [Code]

GraPE: A Generate-Plan-Edit Framework for Compositional T2I Synthesis
[Website] [Project] [Code]

ConceptLab: Creative Generation using Diffusion Prior Constraints
[Website] [Project] [Code]

Aligning Text-to-Image Diffusion Models with Reward Backpropagationn
[Website] [Project] [Code]

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
[Website] [Project] [Code]

Tiled Diffusion
[Website] [Project] [Code]

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
[Website] [Project] [Code]

One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
[Website] [Project] [Code]

TokenCompose: Grounding Diffusion with Token-level Supervision
[Website] [Project] [Code]

DiffusionGPT: LLM-Driven Text-to-Image Generation System
[Website] [Project] [Code]

Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models
[Website] [Project] [Code]

Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support
[Website] [Project] [Code]

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
[Website] [Project] [Code]

MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion
[Website] [Project] [Code]

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models
[Website] [Project] [Code]

Stylus: Automatic Adapter Selection for Diffusion Models
[Website] [Project] [Code]

MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
[Website] [Project] [Code]

Negative Token Merging: Image-based Adversarial Feature Guidance
[Website] [Project] [Code]

Iterative Object Count Optimization for Text-to-image Diffusion Models
[Website] [Project] [Code]

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
[Website] [Project] [Code]

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
[Website] [Project] [Code]

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
[Website] [Project] [Code]

TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
[Website] [Project] [Code]

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
[ACM MM 2023 Oral] [Code]

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models
[ICLR 2024] [Code]

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
[NeurIPS 2024] [Code]

Dynamic Prompt Optimizing for Text-to-Image Generation
[CVPR 2024] [Code]

Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
[CVPR 2024] [Code]

Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
[CVPR 2024] [Code]

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
[CVPR 2024] [Code]

Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
[ECCV 2024] [Code]

On Discrete Prompt Optimization for Diffusion Models
[ICML 2024] [Code]

Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function
[NeurIPS 2024] [Code]

Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization
[ACM MM 2024] [Code]

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
[NeurIPS 2023] [Code]

Diffusion Model Alignment Using Direct Preference Optimization
[Website] [Code]

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment
[Website] [Code]

Bridging the Gap: Aligning Text-to-Image Diffusion Models with Specific Feedback
[Website] [Code]

Zigzag Diffusion Sampling: The Path to Success Is Zigzag
[Website] [Code]

Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models
[Website] [Code]

Progressive Compositionality In Text-to-Image Generative Models
[Website] [Code]

Improving Long-Text Alignment for Text-to-Image Diffusion Models
[Website] [Code]

Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization
[Website] [Code]

RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images
[Website] [Code]

Aggregation of Multi Diffusion Models for Enhancing Learned Representations
[Website] [Code]

AID: Attention Interpolation of Text-to-Image Diffusion
[Website] [Code]

Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
[Website] [Code]

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
[Website] [Code]

ORES: Open-vocabulary Responsible Visual Synthesis
[Website] [Code]

Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness
[Website] [Code]

Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models
[Website] [Code]

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
[Website] [Code]

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs
[Website] [Code]

Detector Guidance for Multi-Object Text-to-Image Generation
[Website] [Code]

Designing a Better Asymmetric VQGAN for StableDiffusion
[Website] [Code]

FABRIC: Personalizing Diffusion Models with Iterative Feedback
[Website] [Code]

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
[Website] [Code]

Progressive Text-to-Image Diffusion with Soft Latent Direction
[Website] [Code]

Hypernymy Understanding Evaluation of Text-to-Image Models via WordNet Hierarchy
[Website] [Code]

TraDiffusion: Trajectory-Based Training-Free Image Generation
[Website] [Code]

If at First You Don’t Succeed, Try, Try Again:Faithful Diffusion-based Text-to-Image Generation by Selection
[Website] [Code]

CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
[Website] [Code]

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
[Website] [Code]

Making Multimodal Generation Easier: When Diffusion Models Meet LLMs
[Website] [Code]

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
[Website] [Code]

AltDiffusion: A Multilingual Text-to-Image Diffusion Model
[Website] [Code]

It is all about where you start: Text-to-image generation with seed selection
[Website] [Code]

End-to-End Diffusion Latent Optimization Improves Classifier Guidance
[Website] [Code]

Correcting Diffusion Generation through Resampling
[Website] [Code]

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
[Website] [Code]

Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
[Website] [Code]

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
[Website] [Code]

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement
[Website] [Code]

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
[Website] [Code]

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
[Website] [Code]

Aligning Few-Step Diffusion Models with Dense Reward Difference Learning
[Website] [Code]

LightIt: Illumination Modeling and Control for Diffusion Models
[CVPR 2024] [Project]

Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis
[NeurIPS 2024] [Project]

Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG
[Website] [Project]

Scalable Ranked Preference Optimization for Text-to-Image Generation
[Website] [Project]

A Noise is Worth Diffusion Guidance
[Website] [Project]

LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors
[Website] [Project]

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation
[Website] [Project]

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
[Website] [Project]

MotiF: Making Text Count in Image Animation with Motion Focal Loss
[Website] [Project]

RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance
[Website] [Project]

UniFL: Improve Stable Diffusion via Unified Feedback Learning
[Website] [Project]

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
[Website] [Project]

ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
[Website] [Project]

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
[Website] [Project]

Semantic Guidance Tuning for Text-To-Image Diffusion Models
[Website] [Project]

Amazing Combinatorial Creation: Acceptable Swap-Sampling for Text-to-Image Generation
[Website] [Project]

Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation
[Website] [Project]

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
[Website] [Project]

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
[Website] [Project]

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes
[Website] [Project]

Lazy Diffusion Transformer for Interactive Image Editing
[Website] [Project]

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
[Website] [Project]

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
[Website] [Project]

Norm-guided latent space exploration for text-to-image generation
[NeurIPS 2023] [Website]

Improving Diffusion-Based Image Synthesis with Context Prediction
[NeurIPS 2023] [Website]

GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
[ECCV 2024]

MultiGen: Zero-shot Image Generation from Multi-modal Prompt
[ECCV 2024]

On Mechanistic Knowledge Localization in Text-to-Image Generative Models
[ICML 2024]

Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation
[NeurIPS 2024]

Generating Compositional Scenes via Text-to-image RGBA Instance Generation
[NeurIPS 2024]

A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedding Optimization
[Website]

PROUD: PaRetO-gUided Diffusion Model for Multi-objective Generation
[Website]

Exposure Diffusion: HDR Image Generation by Consistent LDR denoising
[Website]

Information Theoretic Text-to-Image Alignment
[Website]

Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers
[Website]

Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control
[Website]

Aligning Diffusion Models by Optimizing Human Utility
[Website]

Instruct-Imagen: Image Generation with Multi-modal Instruction
[Website]

CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models
[Website]

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask
[Website]

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
[Website]

Text2Layer: Layered Image Generation using Latent Diffusion Model
[Website]

Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling
[Website]

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
[Website]

UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion
[Website]

Improving Compositional Text-to-image Generation with Large Vision-Language Models
[Website]

Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else
[Website]

Unseen Image Synthesis with Diffusion Models
[Website]

AnyLens: A Generative Diffusion Model with Any Rendering Lens
[Website]

Seek for Incantations: Towards Accurate Text-to-Image Diffusion Synthesis through Prompt Engineering
[Website]

Text2Street: Controllable Text-to-image Generation for Street Views
[Website]

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
[Website]

Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Model
[Website]

Debiasing Text-to-Image Diffusion Models
[Website]

Stochastic Conditional Diffusion Models for Semantic Image Synthesis
[Website]

Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion
[Website]

Transparent Image Layer Diffusion using Latent Transparency
[Website]

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation
[Website]

HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
[Website]

StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
[Website]

Make Me Happier: Evoking Emotions Through Image Diffusion Models
[Website]

Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model
[Website]

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
[Website]

AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation
[Website]

U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models
[Website]

ECNet: Effective Controllable Text-to-Image Diffusion Models
[Website]

TextCraftor: Your Text Encoder Can be Image Quality Controller
[Website]

Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding
[Website]

Towards Better Text-to-Image Generation Alignment via Attention Modulation
[Website]

Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model
[Website]

SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance
[Website]

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
[Website]

Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
[Website]

FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting
[Website]

Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models
[Website]

SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation
[Website]

Training-Free Sketch-Guided Diffusion with Latent Optimization
[Website]

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
[Website]

Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models
[Website]

Training-free Diffusion Model Alignment with Sampling Demons
[Website]

MinorityPrompt: Text to Minority Image Generation via Prompt Optimization
[Website]

AUTOMATED FILTERING OF HUMAN FEEDBACK DATA FOR ALIGNING TEXT-TO-IMAGE DIFFUSION MODELS
[Website]

Saliency Guided Optimization of Diffusion Latents
[Website]

Preference Optimization with Multi-Sample Comparisons
[Website]

CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
[Website]

Redefining in Dictionary: Towards a Enhanced Semantic Understanding of Creative Generation
[Website]

Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation
[Website]

Improving image synthesis with diffusion-negative sampling
[Website]

Golden Noise for Diffusion Models: A Learning Framework
[Website]

Test-time Conditional Text-to-Image Synthesis Using Diffusion Models
[Website]

Decoupling Training-Free Guided Diffusion by ADMM
[Website]

Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
[Website]

Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
[Website]

TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
[Website]

Unlocking the Potential of Text-to-Image Diffusion with PAC-Bayesian Theory
[Website]

CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis
[Website]

Reward Incremental Learning in Text-to-Image Generation
[Website]

QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
[Website]

Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
[Website]

Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models
[Website]

The Silent Prompt: Initial Noise as Implicit Guidance for Goal-Driven Image Generation
[Website]

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance
[Website]

Visual Lexicon: Rich Image Features in Language Space
[Website]

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models
[Website]

ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction
[Website]

TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization
[Website]

Spatial Control

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation
[ICML 2023] [ICML 2023] [Website] [Project] [Code] [Diffusers Code] [Diffusers Doc] [Replicate Demo]

SceneComposer: Any-Level Semantic Image Synthesis
[CVPR 2023 Highlight] [Website] [Project] [Code]

GLIGEN: Open-Set Grounded Text-to-Image Generation
[CVPR 2023] [Website] [Code] [Demo]

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
[ICLR 2023] [Website] [Project] [Code]

Visual Programming for Text-to-Image Generation and Evaluation
[NeurIPS 2023] [Website] [Project] [Code]

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
[ICLR 2024] [Website] [Project] [Code]

GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation
[NeurIPS 2024] [Project] [Code]

ReCo: Region-Controlled Text-to-Image Generation
[CVPR 2023] [Website] [Code]

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
[ICCV 2023] [Website] [Code]

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
[ICCV 2023] [Website] [Code]

Dense Text-to-Image Generation with Attention Modulation
[ICCV 2023] [Website] [Code]

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
[Website] [Project] [Code] [Demo] [Blog]

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
[CVPR 2024] [Code] [Project]

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
[CVPR 2024] [Project] [Code]

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
[Website] [Project] [Code]

Training-Free Layout Control with Cross-Attention Guidance
[Website] [Project] [Code]

ROICtrl: Boosting Instance Control for Visual Generation
[Website] [Project] [Code]

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
[Website] [Project] [Code]

Directed Diffusion: Direct Control of Object Placement through Attention Guidance
[Website] [Project] [Code]

Grounded Text-to-Image Synthesis with Attention Refocusing
[Website] [Project] [Code]

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
[Website] [Project] [Code]

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation
[Website] [Project] [Code]

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
[Website] [Project] [Code]

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation
[Website] [Project] [Code]

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
[Website] [Project] [Code]

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
[Website] [Project] [Code]

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
[Website] [Project] [Code]

InstanceDiffusion: Instance-level Control for Image Generation
[Website] [Project] [Code]

Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
[CVPR 2024] [Code]

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
[CVPR 2024] [Code]

Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation
[Website] [Code]

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
[Website] [Code]

Enhancing Object Coherence in Layout-to-Image Synthesis
[Website] [Code]

Training-free Regional Prompting for Diffusion Transformers
[Website] [Code]

DivCon: Divide and Conquer for Progressive Text-to-Image Generation
[Website] [Code]

RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models
[Website] [Code]

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control
[Website] [Code]

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation
[Website] [Code]

Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis
[ECCV 2024] [Project]

ReCorD: Reasoning and Correcting Diffusion for HOI Generation
[ACM MM 2024] [Project]

Compositional Text-to-Image Generation with Dense Blob Representations
[Website] [Project]

GroundingBooth: Grounding Text-to-Image Customization
[Website] [Project]

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
[Website] [Project]

ReGround: Improving Textual and Spatial Grounding at No Cost
[Website] [Project]

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
[CVPR 2024]

Guided Image Synthesis via Initial Image Editing in Diffusion Model
[ACM MM 2023]

Training-free Composite Scene Generation for Layout-to-Image Synthesis
[ECCV 2024]

LSReGen: Large-Scale Regional Generator via Backward Guidance Framework
[Website]

Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion
[Website]

Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching
[Website]

Boundary Attention Constrained Zero-Shot Layout-To-Image Generation
[Website]

Enhancing Image Layout Control with Loss-Guided Diffusion Models
[Website]

GLoD: Composing Global Contexts and Local Details in Image Generation
[Website]

A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis
[Website]

Controllable Text-to-Image Generation with GPT-4
[Website]

Localized Text-to-Image Generation for Free via Cross Attention Control
[Website]

Training-Free Location-Aware Text-to-Image Synthesis
[Website]

Composite Diffusion | whole >= \Sigma parts
[Website]

Continuous Layout Editing of Single Images with Diffusion Models
[Website]

Zero-shot spatial layout conditioning for text-to-image diffusion models
[Website]

Obtaining Favorable Layouts for Multiple Object Generation
[Website]

LoCo: Locally Constrained Training-Free Layout-to-Image Synthesis
[Website]

Self-correcting LLM-controlled Diffusion Models
[Website]

Joint Generative Modeling of Scene Graphs and Images via Diffusion Models
[Website]

Spatial-Aware Latent Initialization for Controllable Image Generation
[Website]

Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control
[Website]

ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation
[Website]

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
[Website]

Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
[Website]

SpotActor: Training-Free Layout-Controlled Consistent Image Generation
[Website]

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation
[Website]

Scribble-Guided Diffusion for Training-free Text-to-Image Generation
[Website]

3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
[Website]

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
[Website]

I2I translation

⭐⭐⭐SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
[ICLR 2022] [Website] [Project] [Code]

⭐⭐⭐DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
[CVPR 2022] [Website] [Code]

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
[NeurIPS 2023] [Website] [Project] [Code]

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
[CVPR 2024] [Project] [Code]

Diffusion-based Image Translation using Disentangled Style and Content Representation
[ICLR 2023] [Website] [Code]

FlexIT: Towards Flexible Semantic Image Translation
[CVPR 2022] [Website] [Code]

Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
[ICCV 2023] [Website] [Code]

E2GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
[ICML 2024] [Project] [Code]

Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models
[Website] [Project] [Code]

Cross-Image Attention for Zero-Shot Appearance Transfer
[Website] [Project] [Code]

FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models
[Website] [Project] [Code]

Diffusion Guided Domain Adaptation of Image Generators
[Website] [Project] [Code]

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
[Website] [Project] [Code]

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models
[Website] [Project] [Code]

FilterPrompt: Guiding Image Transfer in Diffusion Models
[Website] [Project] [Code]

Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
[ECCV 2024] [Code]

One-Shot Structure-Aware Stylized Image Synthesis
[CVPR 2024] [Code]

BBDM: Image-to-image Translation with Brownian Bridge Diffusion Models
[CVPR 2023] [Code]

Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile
[AAAI 2024] [Code]

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
[AAAI 2024] [Code]

ZePo: Zero-Shot Portrait Stylization with Faster Sampling
[ACM MM 2024] [Code]

DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer
[ACM MM Asia 2024] [Code]

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
[Website] [Code]

Improving Diffusion-based Image Translation using Asymmetric Gradient Guidance
[Website] [Code]

Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis
[Website] [Code]

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
[Website] [Code]

GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis
[Website] [Code]

CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion
[Website] [Code]

PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering
[Website] [Code]

One-Step Image Translation with Text-to-Image Models
[Website] [Code]

D2Styler: Advancing Arbitrary Style Transfer with Discrete Diffusion Methods
[Website] [Code]

StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models
[ICCV 2023] [Website]

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors
[ACM MM 2023]

High-Fidelity Diffusion-based Image Editing
[AAAI 2024]

EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models
[ECCV 2024]

Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer
[Website]

UniHDA: Towards Universal Hybrid Domain Adaptation of Image Generators
[Website]

Regularized Distribution Matching Distillation for One-step Unpaired Image-to-Image Translation
[Website]

TEXTOC: Text-driven Object-Centric Style Transfer
[Website]

Seed-to-Seed: Image Translation in Diffusion Seed Space
[Website]

Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation
[Website]

Latent Schrodinger Bridge: Prompting Latent Diffusion for Fast Unpaired Image-to-Image Translation
[Website]

Segmentation Detection Tracking

odise: open-vocabulary panoptic segmentation with text-to-image diffusion modelss
[CVPR 2023 Highlight] [Project] [Code] [Demo]

LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
[ICCV 2023] [Website] [Project] [Code]

Text-Image Alignment for Diffusion-Based Perception
[CVPR 2024] [Website] [Project] [Code]

Stochastic Segmentation with Conditional Categorical Diffusion Models
[ICCV 2023] [Website] [Code]

DDP: Diffusion Model for Dense Visual Prediction
[ICCV 2023] [Website] [Code]

DiffusionDet: Diffusion Model for Object Detection
[ICCV 2023] [Website] [Code]

OVTrack: Open-Vocabulary Multiple Object Tracking
[CVPR 2023] [Website] [Project]

SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
[NeurIPS 2023] [Website] [Code]

DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
[CVPR 2024] [Project] [Code]

Zero-Shot Image Segmentation via Recursive Normalized Cut on Diffusion Features
[Website] [Project] [Code]

Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
[Website] [Project] [Code]

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
[Website] [Project] [Code]

InvSeg: Test-Time Prompt Inversion for Semantic Segmentation
[Website] [Project] [Code]

SMITE: Segment Me In TimE
[Website] [Project] [Code]

Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation
[NeurIPS 2024] [Code]

Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
[ECCV 2024] [Code]

ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model
[Website] [Code]

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow
[Website] [Code]

Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
[Website] [Code]

Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models
[Website] [Code]

Scribble Hides Class: Promoting Scribble-Based Weakly-Supervised Semantic Segmentation with Its Class Label
[Website] [Code]

Personalize Segment Anything Model with One Shot
[Website] [Code]

DiffusionTrack: Diffusion Model For Multi-Object Tracking
[Website] [Code]

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
[Website] [Code]

A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
[Website] [Code]

Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
[Website] [Code]

UniGS: Unified Representation for Image Generation and Segmentation
[Website] [Code]

Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
[Website] [Code]

MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation
[Website] [Code]

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
[Website] [Code]

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
[Website] [Code]

No Annotations for Object Detection in Art through Stable Diffusion
[Website] [Code]

EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
[ICLR 2024] [Website] [Project]

Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
[CVPR 2024] [Project]

FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models
[Website] [Project]

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
[Website] [Project]

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models
[Website] [Project]

Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation
[ICCV 2023] [Website]

SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
[CVPR 2024]

Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
[ECCV 2024]

Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation
[NeurIPS 2024]

Generalization by Adaptation: Diffusion-Based Domain Extension for Domain-Generalized Semantic Segmentation
[WACV 2024]

Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis
[ACCV 2024]

A Simple Background Augmentation Method for Object Detection with Diffusion Model
[Website]

Unveiling the Power of Diffusion Features For Personalized Segmentation and Retrieval
[Website]

SLiMe: Segment Like Me
[Website]

ASAM: Boosting Segment Anything Model with Adversarial Tuning
[Website]

Diffusion Features to Bridge Domain Gap for Semantic Segmentation
[Website]

MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation
[Website]

DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery
[Website]

Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
[Website]

Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter
[Website]

Attention as Annotation: Generating Images and Pseudo-masks for Weakly Supervised Semantic Segmentation with Diffusion
[Website]

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models
[Website]

Factorized Diffusion Architectures for Unsupervised Image Generation and Segmentation
[Website]

Patch-based Selection and Refinement for Early Object Detection
[Website]

TrackDiffusion: Multi-object Tracking Data Generation via Diffusion Models
[Website]

Towards Granularity-adjusted Pixel-level Semantic Annotation
[Website]

Gen2Det: Generate to Detect
[Website]

Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors
[Website]

ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model
[Website]

Diverse Generation while Maintaining Semantic Coordination: A Diffusion-Based Data Augmentation Method for Object Detection
[Website]

Generative Edge Detection with Stable Diffusion
[Website]

DINTR: Tracking via Diffusion-based Interpolation
[Website]

Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking
[Website]

DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability
[Website]

Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation
[Website]

Panoptic Diffusion Models: co-generation of images and segmentation maps
[Website]

Additional conditions

⭐⭐⭐Adding Conditional Control to Text-to-Image Diffusion Models
[ICCV 2023 best paper] [Website] [Official Code] [Diffusers Doc] [Diffusers Code]

⭐⭐T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models
[Website] [Official Code] [Diffusers Code]

SketchKnitter: Vectorized Sketch Generation with Diffusion Models
[ICLR 2023 Spotlight] [ICLR 2023 Spotlight] [Website] [Code]

Freestyle Layout-to-Image Synthesis
[CVPR 2023 highlight] [Website] [Project] [Code]

Collaborative Diffusion for Multi-Modal Face Generation and Editing
[CVPR 2023] [Website] [Project] [Code]

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation
[ICCV 2023] [Website] [Project] [Code]

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
[ICCV 2023] [Website] [Code]

Sketch-Guided Text-to-Image Diffusion Models
[SIGGRAPH 2023] [Project] [Code]

Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive
[ICLR 2024] [Project] [Code]

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts
[Website] [Project] [Code]

ControlNeXt: Powerful and Efficient Control for Image and Video Generation
[Website] [Project] [Code]

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance
[Website] [Project] [Code]

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
[Website] [Project] [Code]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
[Website] [Project] [Code]

Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis
[Website] [Project] [Code]

DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
[Website] [Project] [Code]

A Simple Approach to Unifying Diffusion-based Conditional Generation
[Website] [Project] [Code]

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
[Website] [Project] [Code]

Late-Constraint Diffusion Guidance for Controllable Image Synthesis
[Website] [Project] [Code]

Composer: Creative and controllable image synthesis with composable conditions
[Website] [Project] [Code]

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models
[Website] [Project] [Code]

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation
[Website] [Project] [Code]

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild
[Website] [Project] [Code]

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
[Website] [Project] [Code]

LooseControl: Lifting ControlNet for Generalized Depth Conditioning
[Website] [Project] [Code]

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
[Website] [Project] [Code]

ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models
[Website] [Project] [Code]

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
[Website] [Project] [Code]

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior
[Website] [Project] [Code]

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
[ICLR 2024] [Code]

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
[CVPR 2024] [Code]

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis
[AAAI 2025] [Code]

CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
[Website] [Code]

Universal Guidance for Diffusion Models
[Website] [Code]

Late-Constraint Diffusion Guidance for Controllable Image Synthesis
[Website] [Code]

Meta ControlNet: Enhancing Task Adaptation via Meta Learning
[Website] [Code]

Local Conditional Controlling for Text-to-Image Diffusion Models
[Website] [Code]

KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models
[Website] [Code]

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC
[Website] [Code]

OminiControl: Minimal and Universal Control for Diffusion Transformer
[Website] [Code]

Modulating Pretrained Diffusion Models for Multimodal Image Synthesis
[SIGGRAPH 2023] [Project]

SpaText: Spatio-Textual Representation for Controllable Image Generation
[CVPR 2023] [Project]

CCM: Adding Conditional Controls to Text-to-Image Consistency Models
[ICML 2024] [Project]

Dreamguider: Improved Training free Diffusion-based Conditional Generation
[Website] [Project]

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
[Website] [Project]

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
[Website] [Project]

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
[Website] [Project]

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection
[Website] [Project]

Control4D: Dynamic Portrait Editing by Learning 4D GAN from 2D Diffusion-based Editor
[Website] [Project]

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
[Website] [Project]

CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models
[Website] [Project]

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
[Website] [Project]

Sketch-Guided Scene Image Generation
[Website]

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation
[Website]

Conditioning Diffusion Models via Attributes and Semantic Masks for Face Generation
[Website]

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt
[Website]

Adding 3D Geometry Control to Diffusion Models
[Website]

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation
[Website]

JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling
[Website]

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet
[Website]

Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons
[Website]

Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt
[Website]

FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
[Website]

Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation
[Website]

Label-free Neural Semantic Image Synthesis
[Website]

Few-Shot

Discriminative Diffusion Models as Few-shot Vision and Language Learners
[Website] [Code]

Few-Shot Diffusion Models
[Website] [Code]

Few-shot Semantic Image Synthesis with Class Affinity Transfer
[CVPR 2023] [Website]

DiffAlign : Few-shot learning using diffusion based synthesis and alignment
[Website]

Few-shot Image Generation with Diffusion Models
[Website]

Lafite2: Few-shot Text-to-Image Generation
[Website]

Few-Shot Task Learning through Inverse Generative Modeling
[Website]

SD-inpaint

Paint by Example: Exemplar-based Image Editing with Diffusion Models
[CVPR 2023] [Website] [Code] [Diffusers Doc] [Diffusers Code]

GLIDE: Towards photorealistic image generation and editing with text-guided diffusion model
[ICML 2022 Spotlight] [Website] [Code]

Blended Diffusion for Text-driven Editing of Natural Images
[CVPR 2022] [Website] [Project] [Code]

Blended Latent Diffusion
[SIGGRAPH 2023] [Project] [Code]

TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
[ICCV 2023] [Website] [Project] [Code]

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
[CVPR 2023] [Website] [Code]

Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models
[ICML 2023] [Website] [Code]

Coherent and Multi-modality Image Inpainting via Latent Space Optimization
[Website] [Project] [Code]

Inst-Inpaint: Instructing to Remove Objects with Diffusion Models
[Website] [Project] [Code] [Demo]

Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting
[Website] [Project] [Code]

CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models
[Website] [Project] [Code]

AnyDoor: Zero-shot Object-level Image Customization
[Website] [Project] [Code]

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
[Website] [Project] [Code]

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
[Website] [Project] [Code]

Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation
[Website] [Project] [Code]

Towards Language-Driven Video Inpainting via Multimodal Large Language Models
[Website] [Project] [Code]

Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections
[Website] [Project] [Code]

Improving Text-guided Object Inpainting with Semantic Pre-inpainting
[ECCV 2024] [Code]

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
[ECCV 2024] [Code]

360-Degree Panorama Generation from Few Unregistered NFoV Images
[ACM MM 2023] [Code]

Delving Globally into Texture and Structure for Image Inpainting
[ACM MM 2022] [Code]

ControlEdit: A MultiModal Local Clothing Image Editing Method
[Website] [Code]

CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing
[Website] [Code]

DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting
[Website] [Code]

Attentive Eraser: Unleashing Diffusion Model's Object Removal Potential via Self-Attention Redirection Guidance
[Website] [Code]

Training-and-prompt-free General Painterly Harmonization Using Image-wise Attention Sharing
[Website] [Code]

What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle Transfer
[Website] [Code]

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
[Website] [Code]

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
[Website] [Code]

Reference-based Image Composition with Sketch via Structure-aware Diffusion Model
[Website] [Code]

Image Inpainting via Iteratively Decoupled Probabilistic Modeling
[Website] [Code]

ControlCom: Controllable Image Composition using Diffusion Model
[Website] [Code]

Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model
[Website] [Code]

MAGICREMOVER: TUNING-FREE TEXT-GUIDED IMAGE INPAINTING WITH DIFFUSION MODELS
[Website] [Code]

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
[Website] [Code]

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
[Website] [Code]

Sketch-guided Image Inpainting with Partial Discrete Diffusion Process
[Website] [Code]

ReMOVE: A Reference-free Metric for Object Erasure
[Website] [Code]

Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting
[Website] [Code]

MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior
[Website] [Code]

AddMe: Zero-shot Group-photo Synthesis by Inserting People into Scenes
[ECCV 2024] [Project]

Text2Place: Affordance-aware Text Guided Human Placement
[ECCV 2024] [Project]

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
[CVPR 2024] [Project]

Matting by Generation
[SIGGRAPH 2024] [Project]

PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference
[NeurIPS 2024] [Project]

Taming Latent Diffusion Model for Neural Radiance Field Inpainting
[Website] [Project]

SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
[Website] [Project]

Towards Stable and Faithful Inpainting
[Website] [Project]

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos
[Website] [Project]

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
[Website] [Project]

TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
[ACM MM 2024]

Semantically Consistent Video Inpainting with Conditional Diffusion Models
[Website]

Personalized Face Inpainting with Diffusion Models by Parallel Visual Attention
[Website]

Outline-Guided Object Inpainting with Diffusion Models
[Website]

SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
[Website]

Gradpaint: Gradient-Guided Inpainting with Diffusion Models
[Website]

Infusion: Internal Diffusion for Video Inpainting
[Website]

Rethinking Referring Object Removal
[Website]

Tuning-Free Image Customization with Image and Text Guidance
[Website]

VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model
[Website]

FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image
[Website]

InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture
[Website]

Thinking Outside the BBox: Unconstrained Generative Object Compositing
[Website]

Content-aware Tile Generation using Exterior Boundary Inpainting
[Website]

AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status
[Website]

TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning
[Website]

MagicEraser: Erasing Any Objects via Semantics-Aware Control
[Website]

I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting
[Website]

VIPaint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference
[Website]

FreeCond: Free Lunch in the Input Conditions of Text-Guided Inpainting
[Website]

PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control
[Website]

Refine-by-Align: Reference-Guided Artifacts Refinement through Semantic Alignment
[Website]

Advanced Video Inpainting Using Optical Flow-Guided Efficient Diffusion
[Website]

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
[Website]

AsyncDSB: Schedule-Asynchronous Diffusion Schrödinger Bridge for Image Inpainting
[Website]

RAD: Region-Aware Diffusion Models for Image Inpainting
[Website]

Layout Generation

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
[CVPR 2023] [Website] [Project] [Code]

Desigen: A Pipeline for Controllable Design Template Generation
[CVPR 2024] [Project] [Code]

DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer
[ICCV 2023] [Website] [Code]

LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models
[ICCV 2023] [Website] [Code]

Desigen: A Pipeline for Controllable Design Template Generation
[CVPR 2024] [Code]

DogLayout: Denoising Diffusion GAN for Discrete and Continuous Layout Generation
[Website] [Code]

LayoutDM: Transformer-based Diffusion Model for Layout Generation
[CVPR 2023] [Website]

Unifying Layout Generation with a Decoupled Diffusion Model
[CVPR 2023] [Website]

PLay: Parametrically Conditioned Layout Generation using Latent Diffusion
[ICML 2023] [Website]

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints
[ICLR 2024]

SLayR: Scene Layout Generation with Rectified Flow
[Website]

CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model
[Website]

Diffusion-based Document Layout Generation
[Website]

Dolfin: Diffusion Layout Transformers without Autoencoder
[Website]

LayoutFlow: Flow Matching for Layout Generation
[Website]

Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
[Website]

Text Generation

⭐⭐TextDiffuser: Diffusion Models as Text Painters
[NeurIPS 2023] [Website] [Project] [Code]

⭐⭐TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
[ECCV 2024 Oral] [Project] [Code]

GlyphControl: Glyph Conditional Control for Visual Text Generation
[NeurIPS 2023] [Website] [Code]

DiffUTE: Universal Text Editing Diffusion Model
[NeurIPS 2023] [Website] [Code]

Word-As-Image for Semantic Typography
[SIGGRAPH 2023] [Project] [Code]

Kinetic Typography Diffusion Model
[ECCV 2024] [Project] [Code]

Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
[Website] [Project] [Code]

JoyType: A Robust Design for Multilingual Visual Text Creation
[Website] [Project] [Code]

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
[Website] [Project] [Code]

One-Shot Diffusion Mimicker for Handwritten Text Generation
[ECCV 2024] [Code]

DCDM: Diffusion-Conditioned-Diffusion Model for Scene Text Image Super-Resolution
[ECCV 2024] [Code]

HFH-Font: Few-shot Chinese Font Synthesis with Higher Quality, Faster Speed, and Higher Resolution
[SIGGRAPH Asia 2024] [Code]

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model
[AAAI 2024] [Code]

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
[AAAI 2024] [Code]

Text Image Inpainting via Global Structure-Guided Diffusion Models
[AAAI 2024] [Code]

Ambigram generation by a diffusion model
[ICDAR 2023] [Code]

Scene Text Image Super-resolution based on Text-conditional Diffusion Models
[WACV 2024] [Code]

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
[ECCV 2024] [Code]

First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending
[ECAI 2024] [Code]

VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models
[Website] [Code]

Visual Text Generation in the Wild
[Website] [Code]

Deciphering Oracle Bone Language with Diffusion Models
[Website] [Code]

High Fidelity Scene Text Synthesis
[Website] [Code]

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
[Website] [Code]

AnyText: Multilingual Visual Text Generation And Editing
[Website] [Code]

AnyText2: Visual Text Generation and Editing With Customizable Attributes
[Website] [Code]

Few-shot Calligraphy Style Learning
[Website] [Code]

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
[Website] [Code]

DiffusionPen: Towards Controlling the Style of Handwritten Text Generation
[Website] [Code]

AmbiGen: Generating Ambigrams from Pre-trained Diffusion Model
[Website] [Project]

UniVG: Towards UNIfied-modal Video Generation
[Website] [Project]

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
[Website] [Project]

DECDM: Document Enhancement using Cycle-Consistent Diffusion Models
[WACV 2024]

SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
[Website]

AnyTrans: Translate AnyText in the Image with Large Scale Models
[Website]

ARTIST: Improving the Generation of Text-rich Images by Disentanglement
[Website]

Improving Text Generation on Images with Synthetic Captions
[Website]

CustomText: Customized Textual Image Generation using Diffusion Models
[Website]

VecFusion: Vector Font Generation with Diffusion
[Website]

Typographic Text Generation with Off-the-Shelf Diffusion Model
[Website]

Font Style Interpolation with Diffusion Models
[Website]

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation
[Website]

DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation
[Website]

CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction
[Website]

Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models
[Website]

Text Image Generation for Low-Resource Languages with Dual Translation Learning
[Website]

Decoupling Layout from Glyph in Online Chinese Handwriting Generation
[Website]

Empowering Backbone Models for Visual Text Generation with Input Granularity Control and Glyph-Aware Training
[Website]

TextMaster: Universal Controllable Text Edit
[Website]

Towards Visual Text Design Transfer Across Languages
[Website]

DiffSTR: Controlled Diffusion Models for Scene Text Removal
[Website]

TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images
[Website]

TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models
[Website]

Conditional Text-to-Image Generation with Reference Guidance
[Website]

Type-R: Automatically Retouching Typos for Text-to-Image Generation
[Website]

AMO Sampler: Enhancing Text Rendering with Overshooting
[Website]

FonTS: Text Rendering with Typography and Style Controls
[Website]

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder
[Website]

Super Resolution

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting
[NeurIPS 2023 spotlight] [Website] [Project] [Code]

Image Super-Resolution via Iterative Refinement
[TPAMI] [Website] [Project] [Code]

DiffIR: Efficient Diffusion Model for Image Restoration
[ICCV 2023] [Website] [Code]

Kalman-Inspired Feature Propagation for Video Face Super-Resolution
[ECCV 2024] [Project] [Code]

HoliSDiP: Image Super-Resolution via Holistic Semantics and Diffusion Prior
[Website] [Project] [Code]

MatchDiffusion: Training-free Generation of Match-cuts
[Website] [Project] [Code]

Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
[Website] [Project] [Code]

AddSR: Accelerating Diffusion-based Blind Super-Resolution with Adversarial Diffusion Distillation
[Website] [Project] [Code]

FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution
[Website] [Project] [Code]

Exploiting Diffusion Prior for Real-World Image Super-Resolution
[Website] [Project] [Code]

SinSR: Diffusion-Based Image Super-Resolution in a Single Step
[CVPR 2024] [Code]

CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution
[CVPR 2024] [Code]

Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs
[NeurIPS 2024] [Code]

SeeClear: Semantic Distillation Enhances Pixel Condensation for Video Super-Resolution
[NeurIPS 2024] [Code]

Iterative Token Evaluation and Refinement for Real-World Super-Resolution
[AAAI 2024] [Code]

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
[Website] [Code]

Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution
[Website] [Code]

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
[Website] [Code]

One Step Diffusion-based Super-Resolution with Time-Aware Distillation
[Website] [Code]

Diffusion Prior Interpolation for Flexibility Real-World Face Super-Resolution
[Website] [Code]

Hero-SR: One-Step Diffusion for Super-Resolution with Human Perception Priors
[Website] [Code]

RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution
[Website] [Code]

One-Step Effective Diffusion Network for Real-World Image Super-Resolution
[Website] [Code]

Binarized Diffusion Model for Image Super-Resolution
[Website] [Code]

Does Diffusion Beat GAN in Image Super Resolution?
[Website] [Code]

PatchScaler: An Efficient Patch-independent Diffusion Model for Super-Resolution
[Website] [Code]

DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion
[Website] [Code]

Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
[Website] [Code]

OFTSR: One-Step Flow for Image Super-Resolution with Tunable Fidelity-Realism Trade-offs
[Website] [Code]

Arbitrary-steps Image Super-resolution via Diffusion Inversion
[Website] [Code]

Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
[Website] [Code]

DSR-Diff: Depth Map Super-Resolution with Diffusion Model
[Website] [Code]

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach
[Website] [Code]

RFSR: Improving ISR Diffusion Models via Reward Feedback Learning
[Website] [Code]

SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution
[Website] [Code]

XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
[Website] [Code]

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
[Website] [Code]

BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution
[Website] [Code]

TASR: Timestep-Aware Diffusion Model for Image Super-Resolution
[Website] [Code]

HSR-Diff: Hyperspectral Image Super-Resolution via Conditional Diffusion Models
[ICCV 2023] [Website]

Text-guided Explorable Image Super-resolution
[CVPR 2024]

Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
[CVPR 2024]

AdaDiffSR: Adaptive Region-aware Dynamic Acceleration Diffusion Model for Real-World Image Super-Resolution
[CVPR 2024]

Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network
[AAAI 2024]

Detail-Enhancing Framework for Reference-Based Image Super-Resolution
[Website]

You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
[Website]

Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution
[Website]

Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models
[Website]

Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
[Website]

YODA: You Only Diffuse Areas. An Area-Masked Diffusion Approach For Image Super-Resolution
[Website]

Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution -- a Non-Denoising Model
[Website]

TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution
[Website]

ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution
[Website]

Image Super-Resolution with Text Prompt Diffusio
[Website]

DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution
[Website]

DREAM: Diffusion Rectification and Estimation-Adaptive Models
[Website]

Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
[Website]

Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
[Website]

CasSR: Activating Image Power for Real-World Image Super-Resolution
[Website]

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
[Website]

Frequency-Domain Refinement with Multiscale Diffusion for Super Resolution
[Website]

ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer
[Website]

Zoomed In, Diffused Out: Towards Local Degradation-Aware Multi-Diffusion for Extreme Image Super-Resolution
[Website]

Adversarial Diffusion Compression for Real-World Image Super-Resolution
[Website]

HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution
[Website]

Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution
[Website]

RealOSR: Latent Unfolding Boosting Diffusion-based Real-world Omnidirectional Image Super-Resolution
[Website]

CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution
[Website]

Video Generation

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators
[ICCV 2023 Oral] [Website] [Project] [Code]

SinFusion: Training Diffusion Models on a Single Image or Video
[ICML 2023] [Website] [Project] [Code]

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
[CVPR 2023] [Website] [Project] [Code]

ZIGMA: A DiT-style Zigzag Mamba Diffusion Model
[ECCV 2024] [Project] [Code]

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation
[NeurIPS 2022] [Website] [Project] [Code]

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER
[NeurIPS 2023] [Website] [Code]

Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator
[NeurIPS 2023] [Website] [Code]

Conditional Image-to-Video Generation with Latent Flow Diffusion Models
[CVPR 2023] [Website] [Code]

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
[CVPR 2023] [Project] [Code]

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
[CVPR 2024] [Project] [Code]

Video Diffusion Models
[ICLR 2022 workshop] [Website] [Code] [Project]

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
[Website] [Diffusers Doc] [Project] [Code]

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
[ECCV 2024] [Project] [Code]

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
[ECCV 2024] [Project] [Code]

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
[Website] [Project] [Code]

Tora: Trajectory-oriented Diffusion Transformer for Video Generation
[Website] [Project] [Code]

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
[Website] [Project] [Code]

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
[Website] [Project] [Code]

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
[Website] [Project] [Code]

Video Diffusion Alignment via Reward Gradients
[Website] [Project] [Code]

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
[Website] [Project] [Code]

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
[Website] [Project] [Code]

TVG: A Training-free Transition Video Generation Method with Diffusion Models
[Website] [Project] [Code]

VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement
[Website] [Project] [Code]

CamI2V: Camera-Controlled Image-to-Video Diffusion Model
[Website] [Project] [Code]

Identity-Preserving Text-to-Video Generation by Frequency Decomposition
[Website] [Project] [Code]

Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
[Website] [Project] [Code]

MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning
[Website] [Project] [Code]

MotionClone: Training-Free Motion Cloning for Controllable Video Generation
[Website] [Project] [Code]

StableAnimator: High-Quality Identity-Preserving Human Image Animation
[Website] [Project] [Code]

AnimateAnything: Consistent and Controllable Animation for Video Generation
[Website] [Project] [Code]

GameGen-X: Interactive Open-world Game Video Generation
[Website] [Project] [Code]

AniDoc: Animation Creation Made Easier
[Website] [Project] [Code]

VEnhancer: Generative Space-Time Enhancement for Video Generation
[Website] [Project] [Code]

SF-V: Single Forward Video Generation Model
[Website] [Project] [Code]

Video Motion Transfer with Diffusion Transformers
[Website] [Project] [Code]

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
[Website] [Project] [Code]

Pyramidal Flow Matching for Efficient Video Generative Modeling
[Website] [Project] [Code]

AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation
[Website] [Project] [Code]

Trajectory Attention for Fine-grained Video Motion Control
[Website] [Project] [Code]

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration
[Website] [Project] [Code]

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
[Website] [Project] [Code]

CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion
[Website] [Project] [Code]

Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
[Website] [Project] [Code]

MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance
[Website] [Project] [Code]

VideoTetris: Towards Compositional Text-to-Video Generation
[Website] [Project] [Code]

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
[Website] [Project] [Code]

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
[Website] [Project] [Code]

MotionBooth: Motion-Aware Customized Text-to-Video Generation
[Website] [Project] [Code]

Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
[Website] [Project] [Code]

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
[Website] [Project] [Code]

MotionDreamer: Zero-Shot 3D Mesh Animation from Video Diffusion Models
[Website] [Project] [Code]

MotionCraft: Physics-based Zero-Shot Video Generation
[Website] [Project] [Code]

MotionMaster: Training-free Camera Motion Transfer For Video Generation
[Website] [Project] [Code]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
[Website] [Project] [Code]

Motion Inversion for Video Customization
[Website] [Project] [Code]

MagicAvatar: Multimodal Avatar Generation and Animation
[Website] [Project] [Code]

Progressive Autoregressive Video Diffusion Models
[Website] [Project] [Code]

TrailBlazer: Trajectory Control for Diffusion-Based Video Generation
[Website] [Project] [Code]

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
[Website] [Project] [Code]

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
[Website] [Project] [Code]

Breathing Life Into Sketches Using Text-to-Video Priors
[Website] [Project] [Code]

Latent Video Diffusion Models for High-Fidelity Long Video Generation
[Website] [Project] [Code]

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance
[Website] [Project] [Code]

Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising
[Website] [Project] [Code]

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models
[Website] [Project] [Code]

VideoComposer: Compositional Video Synthesis with Motion Controllability
[Website] [Project] [Code]

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
[Website] [Project] [Code]

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models
[Website] [Project] [Code]

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
[Website] [Project] [Code]

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation
[Website] [Project] [Code]

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer
[Website] [Project] [Code]

LLM-GROUNDED VIDEO DIFFUSION MODELS
[Website] [Project] [Code]

FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
[Website] [Project] [Code]

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
[Website] [Project] [Code]

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
[Website] [Project] [Code]

VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning
[Website] [Project] [Code]

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models
[Website] [Project] [Code]

FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
[Website] [Project] [Code]

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation
[Website] [Project] [Code]

ART⋅V: Auto-Regressive Text-to-Video Generation with Diffusion Models
[Website] [Project] [Code]

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
[Website] [Project] [Code]

VideoBooth: Diffusion-based Video Generation with Image Prompts
[Website] [Project] [Code]

MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
[Website] [Project] [Code]

LivePhoto: Real Image Animation with Text-guided Motion Control
[Website] [Project] [Code]

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
[Website] [Project] [Code]

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
[Website] [Project] [Code]

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
[Website] [Project] [Code]

DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models
[Website] [Project] [Code]

Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
[Website] [Project] [Code]

FreeInit: Bridging Initialization Gap in Video Diffusion Models
[Website] [Project] [Code]

Text2AC-Zero: Consistent Synthesis of Animated Characters using 2D Diffusion
[Website] [Project] [Code]

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter
[Website] [Project] [Code]

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
[Website] [Project] [Code]

FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
[Website] [Project] [Code]

Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions
[Website] [Project] [Code]

Latte: Latent Diffusion Transformer for Video Generation
[Website] [Project] [Code]

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
[Website] [Project] [Code]

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
[Website] [Project] [Code]

Towards A Better Metric for Text-to-Video Generation
[Website] [Project] [Code]

Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
[Website] [Project] [Code]

HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models
[Website] [Project] [Code]

AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning
[Website] [Project] [Code]

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
[Website] [Project] [Code]

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control
[Website] [Project] [Code]

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
[Website] [Project] [Code]

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
[Website] [Project] [Code]

Optical-Flow Guided Prompt Optimization for Coherent Video Generation
[Website] [Project] [Code]

Large Motion Video Autoencoding with Cross-modal Video VAE
[Website] [Project] [Code]

FlexiFilm: Long Video Generation with Flexible Conditions
[Website] [Project] [Code]

FIFO-Diffusion: Generating Infinite Videos from Text without Training
[Website] [Project] [Code]

TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
[Website] [Project] [Code]

CV-VAE: A Compatible Video VAE for Latent Generative Video Models
[Website] [Project] [Code]

MVOC: a training-free multiple video object composition method with diffusion models
[Website] [Project] [Code]

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
[Website] [Project] [Code]

Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
[Website] [Project] [Code]

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
[Website] [Project] [Code]

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
[Website] [Project] [Code]

Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction
[Website] [Project] [Code]

AMG: Avatar Motion Guided Video Generation
[Website] [Project] [Code]

DiVE: DiT-based Video Generation with Enhanced Control
[Website] [Project] [Code]

MegActor-Σ: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
[Website] [Project] [Code]

CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
[ICLR 2023] [Code]

UFO: Enhancing Diffusion-Based Video Generation with a Uniform Frame Organizer
[AAAI 2025] [Code]

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
[Website] [Code]

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
[Website] [Code]

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
[Website] [Code]

Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing
[ICLR 2024] [Code]

SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces
[ICLR 2024] [Code]

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
[Website] [Code]

Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach
[Website] [Code]

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
[Website] [Code]

Real-Time Video Generation with Pyramid Attention Broadcast
[Website] [Code]

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
[Website] [Code]

Diffusion Probabilistic Modeling for Video Generation
[Website] [Code]

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
[Website] [Code]

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
[Website] [Code]

Autoregressive Video Generation without Vector Quantization
[Website] [Code]

STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction
[Website] [Code]

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training
[Website] [Code]

Vlogger: Make Your Dream A Vlog
[Website] [Code]

Magic-Me: Identity-Specific Video Customized Diffusion
[Website] [Code]

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
[Website] [Code]

EchoReel: Enhancing Action Generation of Existing Video Diffusion Models
[Website] [Code]

StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
[Website] [Code]

TAVGBench: Benchmarking Text to Audible-Video Generation
[Website] [Code]

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
[Website] [Code]

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
[Website] [Code]

FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
[Website] [Code]

IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis
[Website] [Code]

REDUCIO! Generating 1024×1024 Video within 16 Seconds using Extremely Compressed Motion Latents
[Website] [Code]

GRID: Visual Layout Generation
[Website] [Code]

MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling
[Website] [Code]

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
[Website] [Code]

HARIVO: Harnessing Text-to-Image Models for Video Generation [ECCV 2024] [Project]

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
[CVPR 2024] [Project]

AtomoVideo: High Fidelity Image-to-Video Generation
[CVPR 2024] [Project]

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
[ICLR 2024] [Project]

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
[CVPR 2024] [Project]

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
[ECCV 2024] [Project]

TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
[ECCV 2024] [Project]

Motion Prompting: Controlling Video Generation with Motion Trajectories
[Website] [Project]

Mojito: Motion Trajectory and Intensity Control for Video Generation
[Website] [Project]

OmniCreator: Self-Supervised Unified Generation with Universal Editing
[Website] [Project]

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models
[Website] [Project]

VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation
[Website] [Project]

Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
[Website] [Project]

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
[Website] [Project]

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
[Website] [Project]

Training-free Long Video Generation with Chain of Diffusion Model Experts
[Website] [Project]

Free2Guide: Gradient-Free Path Integral Control for Enhancing Text-to-Video Generation with Large Vision-Language Model
[Website] [Project]

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention
[Website] [Project]

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
[Website] [Project]

Hierarchical Patch Diffusion Models for High-Resolution Video Generation
[Website] [Project]

Mimir: Improving Video Diffusion Models for Precise Text Understanding
[Website] [Project]

From Slow Bidirectional to Fast Causal Video Generators
[Website] [Project]

I4VGen: Image as Stepping Stone for Text-to-Video Generation
[Website] [Project]

Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback
[Website] [Project]

FrameBridge: Improving Image-to-Video Generation with Bridge Models
[Website] [Project]

MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
[Website] [Project]

Boosting Camera Motion Control for Video Diffusion Transformers
[Website] [Project]

UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
[Website] [Project]

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control
[Website] [Project]

Controllable Longer Image Animation with Diffusion Models
[Website] [Project]

AniClipart: Clipart Animation with Text-to-Video Priors
[Website] [Project]

Spectral Motion Alignment for Video Motion Transfer using Diffusion Models
[Website] [Project]

TimeRewind: Rewinding Time with Image-and-Events Video Diffusion
[Website] [Project]

VideoPoet: A Large Language Model for Zero-Shot Video Generation
[Website] [Project]

PEEKABOO: Interactive Video Generation via Masked-Diffusion
[Website] [Project]

Searching Priors Makes Text-to-Video Synthesis Better
[Website] [Project]

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
[Website] [Project]

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
[Website] [Project]

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
[Website] [Project]

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
[Website] [Project]

Imagen Video: High Definition Video Generation with Diffusion Models
[Website] [Project]

MoVideo: Motion-Aware Video Generation with Diffusion Models
[Website] [Project]

Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training
[Website] [Project]

Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
[Website] [Project]

Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning
[Website] [Project]

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model
[Website] [Project]

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
[Website] [Project]

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
[Website] [Project]

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
[Website] [Project]

Customizing Motion in Text-to-Video Diffusion Models
[Website] [Project]

Photorealistic Video Generation with Diffusion Models
[Website] [Project]

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
[Website] [Project]

VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM
[Website] [Project]

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models
[Website] [Project]

ActAnywhere: Subject-Aware Video Background Generation
[Website] [Project]

Lumiere: A Space-Time Diffusion Model for Video Generation
[Website] [Project]

InstructVideo: Instructing Video Diffusion Models with Human Feedback
[Website] [Project]

Boximator: Generating Rich and Controllable Motions for Video Synthesis
[Website] [Project]

Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
[Website] [Project]

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
[Website] [Project]

Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation
[Website] [Project]

Audio-Synchronized Visual Animation
[Website] [Project]

I2VControl: Disentangled and Unified Video Motion Synthesis Control
[Website] [Project]

Mind the Time: Temporally-Controlled Multi-Event Video Generation
[Website] [Project]

VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
[Website] [Project]

S2DM: Sector-Shaped Diffusion Models for Video Generation
[Website] [Project]

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models
[Website] [Project]

AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment
[Website] [Project]

Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation
[Website] [Project]

Dance Any Beat: Blending Beats with Visuals in Dance Video Generation
[Website] [Project]

PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
[Website] [Project]

Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
[Website] [Project]

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
[Website] [Project]

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
[Website] [Project]

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities
[Website] [Project]

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
[Website] [Project]

DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
[Website] [Project]

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide
[Website] [Project]

MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis
[Website] [Project]

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
[Website] [Project]

Improved Video VAE for Latent Video Diffusion Model
[Website] [Project]

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
[Website] [Project]

SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
[Website] [Project]

OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization
[Website] [Project]

DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships
[ACM MM 2024 Oral]

Four-Plane Factorized Video Autoencoders
[Website]

Grid Diffusion Models for Text-to-Video Generation
[Website]

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
[Website]

GenRec: Unifying Video Generation and Recognition with Diffusion Models
[Website]

Efficient Continuous Video Flow Model for Video Prediction
[Website]

Dual-Stream Diffusion Net for Text-to-Video Generation
[Website]

DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control
[Website]

SimDA: Simple Diffusion Adapter for Efficient Video Generation
[Website]

VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
[Website]

Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models
[Website]

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation
[Website]

LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation
[Website]

Optimal Noise pursuit for Augmenting Text-to-Video Generation
[Website]

Make Pixels Dance: High-Dynamic Video Generation
[Website]

Video-Infinity: Distributed Long Video Generation
[Website]

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
[Website]

Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion
[Website]

Decouple Content and Motion for Conditional Image-to-Video Generation
[Website]

X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention
[Website]

F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis
[Website]

MTVG : Multi-text Video Generation with Text-to-Video Models
[Website]

VideoLCM: Video Latent Consistency Model
[Website]

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
[Website]

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation
[Website]

I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models
[Website]

360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
[Website]

CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
[Website]

Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation
[Website]

Training-Free Semantic Video Composition via Pre-trained Diffusion Model
[Website]

STIV: Scalable Text and Image Conditioned Video Generation
[Website]

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
[Website]

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models
[Website]

Human Video Translation via Query Warping
[Website]

Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
[Website]

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
[Website]

Context-aware Talking Face Video Generation
[Website]

Pix2Gif: Motion-Guided Diffusion for GIF Generation
[Website]

Intention-driven Ego-to-Exo Video Generation
[Website]

AnimateDiff-Lightning: Cross-Model Diffusion Distillation
[Website]

Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
[Website]

Matten: Video Generation with Mamba-Attention
[Website]

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
[Website]

ReVideo: Remake a Video with Motion and Content Control
[Website]

VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
[Website]

SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
[Website]

GVDIFF: Grounded Text-to-Video Generation with Diffusion Models
[Website]

Mobius: An High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task
[Website]

Contrastive Sequential-Diffusion Learning: An approach to Multi-Scene Instructional Video Synthesis
[Website]

Multi-sentence Video Grounding for Long Video Generation
[Website]

Fine-gained Zero-shot Video Sampling
[Website]

Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
[Website]

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
[Website]

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
[Website]

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
[Website]

One-Shot Learning Meets Depth Diffusion in Multi-Object Videos
[Website]

Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation
[Website]

S2AG-Vid: Enhancing Multi-Motion Alignment in Video Diffusion Models via Spatial and Syntactic Attention-Based Guidance
[Website]

JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation
[Website]

ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning
[Website]

COMUNI: Decomposing Common and Unique Video Signals for Diffusion-based Video Generation
[Website]

Noise Crystallization and Liquid Noise: Zero-shot Video Generation using Image Diffusion Models
[Website]

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way
[Website]

LumiSculpt: A Consistency Lighting Control Network for Video Generation
[Website]

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
[Website]

OnlyFlow: Optical Flow based Motion Conditioning for Video Diffusion Models
[Website]

Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge
[Website]

SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
[Website]

StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart
[Website]

VIRES: Video Instance Repainting with Sketch and Text Guidance
[Website]

MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
[Website]

Enhancing Sketch Animation: Text-to-Video Diffusion Models with Temporal Consistency and Rigidity Constraints
[Website]

Fleximo: Towards Flexible Text-to-Human Motion Video Generation
[Website]

SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
[Website]

Towards Chunk-Wise Generation for Long Videos
[Website]

Motion Dreamer: Realizing Physically Coherent Video Generation through Scene-Aware Motion Reasoning
[Website]

CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
[Website]

Sketch-Guided Motion Diffusion for Stylized Cinemagraph Synthesis
[Website]

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
[Website]

Mobile Video Diffusion
[Website]

TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
[Website]

Can video generation replace cinematographers? Research on the cinematic language of generated video
[Website]

MotionBridge: Dynamic Video Inbetweening with Flexible Controls
[Website]

Video Editing

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
[ICCV 2023 Oral] [Website] [Project] [Code]

Text2LIVE: Text-Driven Layered Image and Video Editing
[ECCV 2022 Oral] [Project] [code]

Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
[CVPR 2023] [Project] [Code]

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
[ICCV 2023] [Project] [Code]

StableVideo: Text-driven Consistency-aware Diffusion Video Editing
[ICCV 2023] [Website] [Code]

Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
[ECCV 2024] [Project] [Code]

StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
[Website] [Project] [Code]

Video-P2P: Video Editing with Cross-attention Control
[Website] [Project] [Code]

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
[Website] [Project] [Code]

MagicEdit: High-Fidelity and Temporally Coherent Video Editing
[Website] [Project] [Code]

TokenFlow: Consistent Diffusion Features for Consistent Video Editing
[Website] [Project] [Code]

ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing
[Website] [Project] [Code]

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
[Website] [Project] [Code]

MotionDirector: Motion Customization of Text-to-Video Diffusion Models
[Website] [Project] [Code]

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing
[Website] [Project] [Code]

RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
[Website] [Project] [Code]

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
[Website] [Project] [Code]

MotionEditor: Editing Video Motion via Content-Aware Diffusion
[Website] [Project] [Code]

VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
[Website] [Project] [Code]

MagicStick: Controllable Video Editing via Control Handle Transformations
[Website] [Project] [Code]

VidToMe: Video Token Merging for Zero-Shot Video Editing
[Website] [Project] [Code]

VASE: Object-Centric Appearance and Shape Manipulation of Real Videos
[Website] [Project] [Code]

Neural Video Fields Editing
[Website] [Project] [Code]

UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing
[Website] [Project] [Code]

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
[Website] [Project] [Code]

Vid2Vid-zero: Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models
[Website] [Code]

Re-Attentional Controllable Video Diffusion Editing
[Website] [Code]

DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization
[Website] [Code]

LOVECon: Text-driven Training-Free Long Video Editing with ControlNet
[Website] [Code]

Pix2video: Video Editing Using Image Diffusion
[Website] [Code]

E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
[Website] [Code]

Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer
[Website] [Code]

Flow-Guided Diffusion for Video Inpainting
[Website] [Code]

Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models
[Website] [Code]

Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing
[Website] [Code]

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
[Website] [Code]

Shape-Aware Text-Driven Layered Video Editing
[CVPR 2023] [Website] [Project]

VideoDirector: Precise Video Editing via Text-to-Video Models
[Website] [Project]

NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing
[Website] [Project]

Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
[Website] [Project]

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
[Website] [Project]

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models
[Website] [Project]

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
[Website] [Project]

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing
[Website] [Project]

DIVE: Taming DINO for Subject-Driven Video Editing
[Website] [Project]

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
[Website] [Project]

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
[Website] [Project]

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
[Website] [Project]

WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing
[ECCV 2024] [Project]

MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance
[Website] [Project]

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
[Website] [Project]

DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing
[Website] [Project]

MIVE: New Design and Benchmark for Multi-Instance Video Editing
[Website] [Project]

VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
[Website] [Project]

DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
[ECCV 2024]

Edit Temporal-Consistent Videos with Image Diffusion Model
[Website]

Streaming Video Diffusion: Online Video Editing with Diffusion Models
[Website]

Cut-and-Paste: Subject-Driven Video Editing with Attention Control
[Website]

MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation
[Website]

Dreamix: Video Diffusion Models Are General Video Editors
[Website]

Towards Consistent Video Editing with Text-to-Image Diffusion Models
[Website]

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints
[Website]

CCEdit: Creative and Controllable Video Editing via Diffusion Models
[Website]

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models
[Website]

FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier
[Website]

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
[Website]

RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing
[Website]

Object-Centric Diffusion for Efficient Video Editing
[Website]

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing
[Website]

Video Editing via Factorized Diffusion Distillation
[Website]

EffiVED:Efficient Video Editing via Text-instruction Diffusion Models
[Website]

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
[Website]

GenVideo: One-shot Target-image and Shape Aware Video Editing using T2I Diffusion Models
[Website]

Temporally Consistent Object Editing in Videos using Extended Attention
[Website]

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting
[Website]

FRAG: Frequency Adapting Group for Diffusion Video Editing
[Website]

InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models
[Website]

Text-based Talking Video Editing with Cascaded Conditional Diffusion
[Website]

Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion
[Website]

Blended Latent Diffusion under Attention Control for Real-World Video Editing
[Website]

EditBoard: Towards A Comprehensive Evaluation Benchmark for Text-based Video Editing Models
[Website]

DNI: Dilutional Noise Initialization for Diffusion Video Editing
[Website]

FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
[Website]

Replace Anyone in Videos
[Website]

Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing
[Website]

DreamColour: Controllable Video Colour Editing without Training
[Website]

MoViE: Mobile Diffusion for Video Editing
[Website]