Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLava Series (7B, 14B) freeze_vision_tower=false bug #6376

Open
1 task done
xirui-li opened this issue Dec 18, 2024 · 0 comments
Open
1 task done

LLava Series (7B, 14B) freeze_vision_tower=false bug #6376

xirui-li opened this issue Dec 18, 2024 · 0 comments
Labels
pending This problem is yet to be addressed

Comments

@xirui-li
Copy link

xirui-li commented Dec 18, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.15.0-58-generic-x86_64-with-glibc2.35
  • Python version: 3.10.16
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.46.2
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA RTX A6000
  • DeepSpeed version: 0.16.1

Reproduction

FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/train_lora/llava1_5_7b_lora_sft.yaml

Script Setting

model_name_or_path: ***

stage: sft
do_train: true
finetuning_type: lora
lora_target: all

dataset: ***
template: llava
cutoff_len: 4096
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

output_dir: ***
logging_steps: 10
save_steps: 10000
plot_loss: true
overwrite_output_dir: true

per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.01
bf16: true
ddp_timeout: 180000000
lora_rank: 128
lora_alpha: 256
freeze_vision_tower: false

Expected behavior

Smooth freeze_vision_tower=false for Llava series training. But encounter this error:

Error Message

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss.

Others

I have checked the relevant bugs related to qwen2_VL freeze_vision_tower=false bugs and updated to latest repo https://github.com/hiyouga/LLaMA-Factory/issues/5680.

qwen2_VL works perfect under freeze_vision_tower=false, while same script fails when adapted to LLaVa1_5 series (7B, 13B).

End

Thank you for your support and contribution for this wonderful Repo in advance!

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant