-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
感谢作者出色的工作,想请教一下,感激不尽!!!!
1、 sft和rl分别对精度的提升是多少,看论文好像只提供了最终的精度提升结果。
2、论文里面的sft的batchsize是512,请问是要把这个per_device_train_batch_size参数改成512/8(8卡)么?
3、把这个# ### eval注释删掉,是不是会自动跑eval并且存在wandb里面?
### train
run_name: dirl_sink_8b_math_glm_openr1math
include_effective_tokens_per_second: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 1 #4
learning_rate: 1.0e-5
num_train_epochs: 10
lr_scheduler_type: constant_with_warmup
warmup_ratio: 0.03
bf16: true
ddp_timeout: 180000000
# ### eval
# val_size: 0.05
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 10
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels