实验细节咨询

感谢作者出色的工作，想请教一下，感激不尽！！！！
1、 sft和rl分别对精度的提升是多少，看论文好像只提供了最终的精度提升结果。
2、论文里面的sft的batchsize是512，请问是要把这个per_device_train_batch_size参数改成512/8（8卡）么？
3、把这个# ### eval注释删掉，是不是会自动跑eval并且存在wandb里面？
```
### train
run_name: dirl_sink_8b_math_glm_openr1math
include_effective_tokens_per_second: true
per_device_train_batch_size: 1
gradient_accumulation_steps: 1 #4
learning_rate: 1.0e-5
num_train_epochs: 10
lr_scheduler_type: constant_with_warmup
warmup_ratio: 0.03
bf16: true
ddp_timeout: 180000000
# ### eval
# val_size: 0.05
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 10
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

实验细节咨询 #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

实验细节咨询 #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions