We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python version: 3.11.10
### model model_name_or_path: Qwen2.5-3B flash_attn : auto ### method stage: pt do_train: true finetuning_type: full deepspeed: examples/deepspeed/ds_z0_config.json enable_liger_kernel: true ### dataset dataset: llm_train eval_dataset: llm_valid cutoff_len: 4096 overwrite_cache: false preprocessing_num_workers: 16 preprocessing_batch_size: 1000 tokenized_path: tokenized_data_2048 ### output output_dir: qwen2_out logging_steps: 1000 save_steps: 50000 save_total_limit: 5 plot_loss: true overwrite_output_dir: false report_to: wandb run_name: official_qwen_pre_training ### train per_device_train_batch_size: 3 gradient_accumulation_steps: 4 learning_rate: 5.0e-5 num_train_epochs: 1.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 disable_gradient_checkpointing: true ### eval per_device_eval_batch_size: 4 eval_strategy: steps eval_steps: 50000
When the data size was 1000 examples the training speed is 3.69s/it.
dataset: c4_demo cutoff_len: 4096 max_samples: 1000
When the data size is 21M examples the training speed became 8.50s/it.
dataset: llm_train eval_dataset: llm_valid cutoff_len: 4096 overwrite_cache: false preprocessing_num_workers: 16 preprocessing_batch_size: 1000 tokenized_path: tokenized_data_2048
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Reminder
System Info
Python version: 3.11.10
Reproduction
Expected behavior
When the data size was 1000 examples the training speed is 3.69s/it.
When the data size is 21M examples the training speed became 8.50s/it.
Others
No response
The text was updated successfully, but these errors were encountered: