Hi—thanks for the fantastic library!
I’m using MCTX (Gumbel MuZero search) for multi-agent path finding on grids. Each agent has 5 actions (UP/DOWN/LEFT/RIGHT/STAY), so the joint action space grows as $5^N$:
- 2 agents → 25 actions
- 3 agents → 125 actions
- 4 agents → 625 actions
I don’t have a policy-value network yet; I’m using GMZ as a planner with uniform priors and either value=0 or a light heuristic. Horizons can be long on large maps.
Current settings
num_simulations: 10k–20k
max_depth: 15–30
max_num_considered_actions: 125
Observation
Despite the large simulation budget, plans are often suboptimal compared to a human baseline.
Questions
- Any recommended rules of thumb for choosing
num_simulations vs. max_depth as the branching factor explodes?
- For joint action spaces, guidance on
max_num_considered_actions (consider-all vs. subsample)?
- Suggested
qtransform settings (e.g., value_scale, maxvisit_init, use_mixed_value, rescale_values) when values are zero/heuristic rather than learned?
- With uniform priors, should I keep a nonzero
gumbel_scale to break ties, or is a deterministic setting preferable here?
Hi—thanks for the fantastic library!
I’m using MCTX (Gumbel MuZero search) for multi-agent path finding on grids. Each agent has 5 actions (UP/DOWN/LEFT/RIGHT/STAY), so the joint action space grows as$5^N$ :
I don’t have a policy-value network yet; I’m using GMZ as a planner with uniform priors and either
value=0or a light heuristic. Horizons can be long on large maps.Current settings
num_simulations: 10k–20kmax_depth: 15–30max_num_considered_actions: 125Observation
Despite the large simulation budget, plans are often suboptimal compared to a human baseline.
Questions
num_simulationsvs.max_depthas the branching factor explodes?max_num_considered_actions(consider-all vs. subsample)?qtransformsettings (e.g.,value_scale,maxvisit_init,use_mixed_value,rescale_values) when values are zero/heuristic rather than learned?gumbel_scaleto break ties, or is a deterministic setting preferable here?