Skip to content

feat+refactor: SimAI 1.6 GPU Memory Module & Code Quality | SimAI 1.6 GPU 内存推理模块与代码质量提升#243

Open
tianhao909 wants to merge 3 commits intoaliyun:masterfrom
tianhao909:pr1-feat-refactor
Open

feat+refactor: SimAI 1.6 GPU Memory Module & Code Quality | SimAI 1.6 GPU 内存推理模块与代码质量提升#243
tianhao909 wants to merge 3 commits intoaliyun:masterfrom
tianhao909:pr1-feat-refactor

Conversation

@tianhao909
Copy link
Copy Markdown
Collaborator

Summary

This PR introduces the GPU Memory Inference Module with PD-separation (Prefill-Decode disaggregation) support for SimAI 1.6, enabling accurate memory simulation for large-scale models including DeepSeek-671B, Qwen3-MoE-235B, and Qwen3-Next-80B. It also includes code quality improvements across the vidur-alibabacloud modules.

摘要

本 PR 为 SimAI 1.6 引入 GPU 内存推理模块,支持 PD 分离(预填充-解码分离调度),实现对 DeepSeek-671B、Qwen3-MoE-235B、Qwen3-Next-80B 等大规模模型的精确内存仿真。同时包含 vidur-alibabacloud 模块的代码质量改进。

Changes / 变更内容

New Features / 新功能

  • Parameter Counter: Support MoE/Dense/MLA architectures for DeepSeek, Qwen3-MoE, Qwen3-Next models | 参数计数器:支持 MoE/Dense/MLA 架构
  • Memory Planner: PD-separation request memory allocation with prefill/decode disaggregation | 内存规划器:PD 分离请求内存分配
  • AICB Workload Data: Pre-generated workload profiles and HuggingFace model configs | AICB 工作负载数据和 HF 模型配置
  • MFU Calculator: Improved MFU calculation for various architectures | 改进的 MFU 计算器
  • Batch Simulation: run_scenarios.sh for automated multi-model simulation | 批量仿真脚本

Code Quality / 代码质量

  • Replace all print() with proper logging module | 将 print() 替换为 logging
  • Remove ~390 lines of dead/commented-out code | 移除约 390 行死代码
  • Add bilingual (EN/ZH) docstrings to core modules | 添加双语文档字符串
  • Clean up imports and unused variables | 清理导入和未使用变量

Files Changed / 文件变更

  • vidur-alibabacloud/vidur/ — Python code (89 files)
  • vidur-alibabacloud/data/ — AICB workload data + HF model configs
  • vidur-alibabacloud/examples/ — run_scenarios.sh
  • vidur-alibabacloud/.gitignore — Updated ignore rules

Testing / 测试

  • Verified parameter counting accuracy for DeepSeek-671B, Qwen3-MoE-235B, Qwen3-Next-80B
  • Validated PD-separation memory allocation across multiple world_size configurations
  • Tested batch simulation scenarios with run_scenarios.sh

Checklist

  • Code compiles without errors / 代码编译无错误
  • Bilingual commit messages / 双语提交信息
  • No IDE config files included / 不包含 IDE 配置文件
  • .gitignore cleaned of personal dev rules / .gitignore 已清理个人开发规则

Co-authored-by: tianhao909 843101550@qq.com
Co-authored-by: MXtremist 44829997+MXtremist@users.noreply.github.com

tianhao909 and others added 2 commits March 18, 2026 05:16
…n3 inference simulation

Add GPU memory inference and PD-separation (Prefill-Decode disaggregation) support
for large-scale model simulation including DeepSeek-671B, Qwen3-MoE-235B, and
Qwen3-Next-80B. Key changes:

- Add parameter counter for MoE/Dense/MLA architectures
- Add memory planner with PD-separation request allocation
- Integrate AICB workload data and HuggingFace model configs
- Add MFU calculator improvements
- Add execution time entity enhancements
- Add run_scenarios.sh for batch simulation

新增 GPU 内存推理与 PD 分离(预填充-解码分离调度)功能,支持 DeepSeek-671B、
Qwen3-MoE-235B、Qwen3-Next-80B 等大规模模型仿真。主要变更:

- 新增 MoE/Dense/MLA 架构参数计数器
- 新增支持 PD 分离请求分配的内存规划器
- 集成 AICB 工作负载数据和 HuggingFace 模型配置
- 改进 MFU 计算器
- 增强执行时间实体
- 新增批量仿真脚本 run_scenarios.sh

Co-authored-by: tianhao909 <843101550@qq.com>
Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
…d bilingual docstrings

Code quality improvements across vidur-alibabacloud modules:

- Replace all print() calls with proper logging module usage
- Remove ~390 lines of dead/commented-out code
- Add bilingual (EN/ZH) docstrings to core modules
- Clean up imports and unused variables
- Improve execution time predictor and scheduler code

vidur-alibabacloud 模块代码质量改进:

- 将所有 print() 调用替换为标准 logging 模块
- 移除约 390 行死代码/注释代码
- 为核心模块添加双语(中英文)文档字符串
- 清理导入和未使用的变量
- 改进执行时间预测器和调度器代码

Co-authored-by: tianhao909 <843101550@qq.com>
Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
恢复 vidur-alibabacloud/.gitignore 中的开发规则

Co-authored-by: tianhao909 <843101550@qq.com>
Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
@tianhao909 tianhao909 requested a review from Copilot March 18, 2026 13:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds SimAI 1.6 PD-separation (prefill/decode disaggregation) support for GPU memory simulation, expands model/device/node SKU configuration to cover additional large models and hardware, and checks in pre-generated AICB workload/HF config data.

Changes:

  • Implement PD-aware cluster initialization and replica/world-size bookkeeping.
  • Add new model configs (Qwen3-Next-80B, Qwen3-MoE-235B) and new device/node SKU configs (H20/H200/GB200).
  • Add HF model config JSONs and AICB workload CSV/JSON artifacts; update .gitignore to keep required workload data tracked.

Reviewed changes

Copilot reviewed 63 out of 89 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
vidur-alibabacloud/vidur/entities/cluster.py Adds PD separation logic to compute per-phase world sizes/EP and initialize replicas accordingly.
vidur-alibabacloud/vidur/entities/batch.py Removes a leftover debug assertion comment.
vidur-alibabacloud/vidur/config/node_sku_config.py Introduces an H20 DGX node SKU config.
vidur-alibabacloud/vidur/config/model_config.py Adds Qwen3-Next-80B and Qwen3-MoE-235B model config dataclasses.
vidur-alibabacloud/vidur/config/device_sku_config.py Adds H20/H200/GB200 device SKU configs and updates H800 throughput fields.
vidur-alibabacloud/vidur/config/config.py Extends ReplicaConfig with PD-specific knobs and EP auto-computation/logging.
vidur-alibabacloud/data/hf_configs/qwen3-next-80B-A3B_config.json Adds HF-style config for Qwen3-Next-80B-A3B.
vidur-alibabacloud/data/hf_configs/qwen3-next-80B-A3B_Instruct_FP8_config.json Adds HF-style config for Qwen3-Next-80B-A3B Instruct FP8.
vidur-alibabacloud/data/hf_configs/qwen3-8B_config.json Adds HF-style config for Qwen3-8B.
vidur-alibabacloud/data/hf_configs/qwen3-30B-A3B_config.json Adds HF-style config for Qwen3-MoE 30B-A3B.
vidur-alibabacloud/data/hf_configs/qwen3-235B-A22B_config.json Adds HF-style config for Qwen3-MoE 235B-A22B.
vidur-alibabacloud/data/hf_configs/qwen3-235B-A22B_FP8_config.json Adds FP8 HF-style config variant for Qwen3-MoE 235B-A22B.
vidur-alibabacloud/data/hf_configs/deepseek_v3_config.json Adds HF-style config for DeepSeek V3.
vidur-alibabacloud/data/hf_configs/deepseek_R1_0528_config.json Adds HF-style config for DeepSeek R1 0528.
vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv Adds prefill workload profile for Qwen3-Next-80B.
vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv Adds decode workload profile for Qwen3-Next-80B.
vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Moe-235B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv Adds prefill workload profile for Qwen3-MoE-235B.
vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Moe-235B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv Adds decode workload profile for Qwen3-MoE-235B.
vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv Adds prefill workload profile for DeepSeek-671B.
vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv Adds decode workload profile for DeepSeek-671B.
vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs3-seq4096-decode.csv Adds a smaller CSV decode workload variant for DeepSeek-671B.
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq106-decode.csv Adds cached decode workload profile for Qwen3-Next-80B (ws8).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq100-prefill.csv Adds cached prefill workload profile for Qwen3-Next-80B (ws8).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq100-decode.csv Adds cached decode workload profile for Qwen3-Next-80B (ws8).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size6-tp1-pp1-ep6-bs1-seq106-decode.csv Adds cached decode workload profile for Qwen3-Next-80B (ws6).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size6-tp1-pp1-ep6-bs1-seq100-decode.csv Adds cached decode workload profile for Qwen3-Next-80B (ws6).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs1-seq106-decode.csv Adds cached decode workload profile for Qwen3-Next-80B (ws32).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs1-seq100-decode.csv Adds cached decode workload profile for Qwen3-Next-80B (ws32).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size2-tp1-pp1-ep2-bs1-seq100-prefill.csv Adds cached prefill workload profile for Qwen3-Next-80B (ws2).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size8-tp4-pp1-ep8-bs1-seq100-prefill.csv Adds cached prefill workload profile for Qwen3-MoE-235B (ws8,tp4).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq106-decode.csv Adds cached decode workload profile for Qwen3-MoE-235B (ws32,tp4).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq100-prefill.csv Adds cached prefill workload profile for Qwen3-MoE-235B (ws32,tp4).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq100-decode.csv Adds cached decode workload profile for Qwen3-MoE-235B (ws32,tp4).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size24-tp4-pp1-ep24-bs1-seq106-decode.csv Adds cached decode workload profile for Qwen3-MoE-235B (ws24,tp4).
vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size24-tp4-pp1-ep24-bs1-seq100-decode.csv Adds cached decode workload profile for Qwen3-MoE-235B (ws24,tp4).
vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size64-tp8-pp1-ep8-bs1-seq106-decode.csv Adds cached decode workload profile for DeepSeek-671B (ws64,tp8).
vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size64-tp8-pp1-ep8-bs1-seq100-decode.csv Adds cached decode workload profile for DeepSeek-671B (ws64,tp8).
vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size48-tp8-pp1-ep48-bs1-seq106-decode.csv Adds cached decode workload profile for DeepSeek-671B (ws48,tp8).
vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size48-tp8-pp1-ep48-bs1-seq100-decode.csv Adds cached decode workload profile for DeepSeek-671B (ws48,tp8).
vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws8-tp1-pp1-ep8-bs1-seq106-decode.json Adds cached AICB JSON for Qwen3-Next-80B (ws8).
vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws8-tp1-pp1-ep8-bs1-seq100-decode.json Adds cached AICB JSON for Qwen3-Next-80B (ws8).
vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws6-tp1-pp1-ep6-bs1-seq106-decode.json Adds cached AICB JSON for Qwen3-Next-80B (ws6).
vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws6-tp1-pp1-ep6-bs1-seq100-decode.json Adds cached AICB JSON for Qwen3-Next-80B (ws6).
vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws32-tp1-pp1-ep32-bs1-seq106-decode.json Adds cached AICB JSON for Qwen3-Next-80B (ws32).
vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws32-tp1-pp1-ep32-bs1-seq100-decode.json Adds cached AICB JSON for Qwen3-Next-80B (ws32).
vidur-alibabacloud/.gitignore Keeps AICB workload artifacts tracked while ignoring other generated CSV/log outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

# 每个 replica 同时处理 prefill 和 decode
# EP = ws = tp * pp * dp (full cluster world_size)
# ============================================================
if rc.pd_node_ratio == 1:
# 2. pd_node_ratio (calculated by ratio)
# pd_node_ratio (按比例计算)
# ============================================================
elif rc.pd_node_ratio > 0 and rc.pd_node_ratio < 1:
Comment on lines 160 to 161
if metrics_config.write_json_trace:
self._write_cluster_info_to_file()
Comment on lines +112 to +115
assert num_p > 0 and num_d > 0, (
f"[Cluster] _num_prefill_replicas={num_p} 和 "
f"_num_decode_replicas={num_d} 必须都 > 0, "
f"来源: {replica_source}")
Comment on lines +140 to +142
assert rc.prefill_world_size > 0 and rc.decode_world_size > 0, (
f"[Cluster] prefill_ws={rc.prefill_world_size} 和 "
f"decode_ws={rc.decode_world_size} 必须都 > 0")
Comment on lines +78 to +82
# GB200 NVL72
class GB200DeviceSKUConfig(BaseDeviceSKUConfig):
fp16_tflops: int = 2500
fp8_tflops: int = 5000
total_memory_gb: int = 192
Comment on lines +486 to +489
metadata={"help": "> add: pd_p2p_comm_dtype for pd disaggregation."
"choices=['fp8', 'float16', 'float32', 'float64', 'bfloat16', 'int8', 'int16', 'int32', 'int64'],"
},

Comment on lines +574 to +586
# 打印 ReplicaConfig 配置摘要 | Print ReplicaConfig summary
logger.info(f"[ReplicaConfig] tp={self.tensor_parallel_size}, pp={self.num_pipeline_stages}, "
f"per_replica_ws={self.world_size}, ep(temp)={self.expert_model_parallel_size}, "
f"pd_ratio={self.pd_node_ratio}")
if self.pd_node_ratio < 1:
p_tp = self.prefill_tensor_parallel_size or self.tensor_parallel_size
p_pp = self.prefill_num_pipeline_stages or self.num_pipeline_stages
d_tp = self.decode_tensor_parallel_size or self.tensor_parallel_size
d_pp = self.decode_num_pipeline_stages or self.num_pipeline_stages
logger.info(f"[ReplicaConfig] PD separation enabled: "
f"prefill(tp={p_tp}, pp={p_pp}), decode(tp={d_tp}, pp={d_pp})")
if self.num_prefill_replicas is not None:
logger.info(f"[ReplicaConfig] User specified num_prefill_replicas={self.num_prefill_replicas}")


@dataclass
class Qwen3235BA22BModelConfig(BaseModelConfig):
Comment on lines +3 to +8
"Qwen2MoeForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"decoder_sparse_step": 1,
"eos_token_id": 151643,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants