LLava Series (7B, 14B) freeze_vision_tower=false bug #6376

xirui-li · 2024-12-18T07:34:59Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.2.dev0
Platform: Linux-5.15.0-58-generic-x86_64-with-glibc2.35
Python version: 3.10.16
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.46.2
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA RTX A6000
DeepSpeed version: 0.16.1

Reproduction

FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_lora/llava1_5_7b_lora_sft.yaml

Script Setting

model_name_or_path: ***

stage: sft
do_train: true
finetuning_type: lora
lora_target: all

dataset: ***
template: llava
cutoff_len: 4096
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

output_dir: ***
logging_steps: 10
save_steps: 10000
plot_loss: true
overwrite_output_dir: true

per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
ddp_timeout: 180000000
lora_rank: 128
lora_alpha: 256
freeze_vision_tower: false

Expected behavior

Smooth freeze_vision_tower=false for Llava series training. But encounter this error:

Error Message

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss.

Others

I have checked the relevant bugs related to qwen2_VL freeze_vision_tower=false bugs and updated to latest repo https://github.com/hiyouga/LLaMA-Factory/issues/5680.

qwen2_VL works perfect under freeze_vision_tower=false, while same script fails when adapted to LLaVa1_5 series (7B, 13B).

End

Thank you for your support and contribution for this wonderful Repo in advance!

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLava Series (7B, 14B) freeze_vision_tower=false bug #6376

LLava Series (7B, 14B) freeze_vision_tower=false bug #6376

xirui-li commented Dec 18, 2024 •

edited

Loading

LLava Series (7B, 14B) freeze_vision_tower=false bug #6376

LLava Series (7B, 14B) freeze_vision_tower=false bug #6376

Comments

xirui-li commented Dec 18, 2024 • edited Loading

Reminder

System Info

Reproduction

Script Setting

Expected behavior

Error Message

Others

End

xirui-li commented Dec 18, 2024 •

edited

Loading