Skip to content

Fix wandb logging on rank>0 and add eval debug instrumentation#1391

Open
ilyes319 wants to merge 1 commit intoACEsuit:mainfrom
ilyes319:wandb-fix
Open

Fix wandb logging on rank>0 and add eval debug instrumentation#1391
ilyes319 wants to merge 1 commit intoACEsuit:mainfrom
ilyes319:wandb-fix

Conversation

@ilyes319
Copy link
Copy Markdown
Contributor

@ilyes319 ilyes319 commented Mar 2, 2026

Summary

  • Guard wandb logging to rank==0 only in distributed training to prevent duplicate/erroring log calls on non-primary ranks
  • Clean up process group on dry_run exit
  • Add MACE_EVAL_DEBUG_INTERVAL env var for evaluation progress logging during long validation runs

Files changed

  • mace/cli/run_train.py -- wandb rank guard and dry_run cleanup
  • mace/tools/train.py -- eval debug interval logging

Test plan

  • Pre-commit (black, isort) clean
  • CI tests (triggered on PR)

Made with Cursor

- Guard wandb logging to rank==0 only in distributed training
- Clean up process group on dry_run exit
- Add MACE_EVAL_DEBUG_INTERVAL env var for evaluation progress logging

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant