Issue High Loss #4180

Katehuuh · 2024-06-10T01:51:55Z

Katehuuh
Jun 10, 2024

Currently on 3a023bc, Llama-3-8B sft QLoRa on a reasonable dataset of 100k sample:

--learning_rate 5e-05
--per_device_train_batch_size 1
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--max_grad_norm 1.0
--neftune_noise_alpha 5
--optim adamw_8bit
--upcast_layernorm True
--bf16 True
--lora_rank 32
--lora_alpha 64
--lora_dropout 0.15

However, i'v avg loss ≈3.0, ('loss': 3.0288, 'grad_norm': 3.2035417556762695, 'learning_rate': 3.639776304355244e-05,) and is usually average loss of ≈1.0. Could this be a dataset effect or an issue with Llama-3-8B? While inferring, it shows minor degradation in learning.

hiyouga · 2024-06-10T08:27:38Z

hiyouga
Jun 10, 2024
Maintainer

Which template did you use? remember to use default template for the non-instruct models
Besides, the dataset also affect the loss values

1 reply

Katehuuh Jun 10, 2024
Author

Which template did you use? remember to use default template for the non-instruct models Besides, the dataset also affect the loss values

QA instruct dataset in format --template alpaca include history:

    "formatting": "alpaca",
    "columns": {
      "prompt": "instruction",
      "query": "input",
      "response": "output",
      "system": "system",
      "history": "history"
    }

Maybe not a template issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue High Loss #4180

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Issue High Loss #4180

Katehuuh Jun 10, 2024

Replies: 1 comment · 1 reply

hiyouga Jun 10, 2024 Maintainer

Katehuuh Jun 10, 2024 Author

Katehuuh
Jun 10, 2024

Replies: 1 comment 1 reply

hiyouga
Jun 10, 2024
Maintainer

Katehuuh Jun 10, 2024
Author