Why Speed per iteration slower when dataset is large #6410

coding2debug · 2024-12-20T16:13:31Z

Reminder

I have read the README and searched the existing issues.

System Info

Python version: 3.11.10

Reproduction

### model
model_name_or_path: Qwen2.5-3B
flash_attn : auto

### method
stage: pt
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z0_config.json
enable_liger_kernel: true

### dataset
dataset: llm_train
eval_dataset: llm_valid
cutoff_len: 4096
overwrite_cache: false
preprocessing_num_workers: 16
preprocessing_batch_size: 1000
tokenized_path: tokenized_data_2048

### output
output_dir: qwen2_out
logging_steps: 1000
save_steps: 50000
save_total_limit: 5
plot_loss: true
overwrite_output_dir: false
report_to: wandb
run_name: official_qwen_pre_training

### train
per_device_train_batch_size: 3
gradient_accumulation_steps: 4
learning_rate: 5.0e-5
num_train_epochs: 1.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
disable_gradient_checkpointing: true

### eval
per_device_eval_batch_size: 4
eval_strategy: steps
eval_steps: 50000

Expected behavior

When the data size was 1000 examples the training speed is 3.69s/it.

dataset: c4_demo
cutoff_len: 4096
max_samples: 1000

When the data size is 21M examples the training speed became 8.50s/it.

dataset: llm_train
eval_dataset: llm_valid
cutoff_len: 4096
overwrite_cache: false
preprocessing_num_workers: 16
preprocessing_batch_size: 1000
tokenized_path: tokenized_data_2048

Others

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Speed per iteration slower when dataset is large #6410

Why Speed per iteration slower when dataset is large #6410

coding2debug commented Dec 20, 2024

Why Speed per iteration slower when dataset is large #6410

Why Speed per iteration slower when dataset is large #6410

Comments

coding2debug commented Dec 20, 2024

Reminder

System Info

Reproduction

Expected behavior

Others