feat+refactor: SimAI 1.6 GPU Memory Module & Code Quality | SimAI 1.6 GPU 内存推理模块与代码质量提升#243
Open
tianhao909 wants to merge 3 commits intoaliyun:masterfrom
Open
feat+refactor: SimAI 1.6 GPU Memory Module & Code Quality | SimAI 1.6 GPU 内存推理模块与代码质量提升#243tianhao909 wants to merge 3 commits intoaliyun:masterfrom
tianhao909 wants to merge 3 commits intoaliyun:masterfrom
Conversation
…n3 inference simulation Add GPU memory inference and PD-separation (Prefill-Decode disaggregation) support for large-scale model simulation including DeepSeek-671B, Qwen3-MoE-235B, and Qwen3-Next-80B. Key changes: - Add parameter counter for MoE/Dense/MLA architectures - Add memory planner with PD-separation request allocation - Integrate AICB workload data and HuggingFace model configs - Add MFU calculator improvements - Add execution time entity enhancements - Add run_scenarios.sh for batch simulation 新增 GPU 内存推理与 PD 分离(预填充-解码分离调度)功能,支持 DeepSeek-671B、 Qwen3-MoE-235B、Qwen3-Next-80B 等大规模模型仿真。主要变更: - 新增 MoE/Dense/MLA 架构参数计数器 - 新增支持 PD 分离请求分配的内存规划器 - 集成 AICB 工作负载数据和 HuggingFace 模型配置 - 改进 MFU 计算器 - 增强执行时间实体 - 新增批量仿真脚本 run_scenarios.sh Co-authored-by: tianhao909 <843101550@qq.com> Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
…d bilingual docstrings Code quality improvements across vidur-alibabacloud modules: - Replace all print() calls with proper logging module usage - Remove ~390 lines of dead/commented-out code - Add bilingual (EN/ZH) docstrings to core modules - Clean up imports and unused variables - Improve execution time predictor and scheduler code vidur-alibabacloud 模块代码质量改进: - 将所有 print() 调用替换为标准 logging 模块 - 移除约 390 行死代码/注释代码 - 为核心模块添加双语(中英文)文档字符串 - 清理导入和未使用的变量 - 改进执行时间预测器和调度器代码 Co-authored-by: tianhao909 <843101550@qq.com> Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
5 tasks
恢复 vidur-alibabacloud/.gitignore 中的开发规则 Co-authored-by: tianhao909 <843101550@qq.com> Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds SimAI 1.6 PD-separation (prefill/decode disaggregation) support for GPU memory simulation, expands model/device/node SKU configuration to cover additional large models and hardware, and checks in pre-generated AICB workload/HF config data.
Changes:
- Implement PD-aware cluster initialization and replica/world-size bookkeeping.
- Add new model configs (Qwen3-Next-80B, Qwen3-MoE-235B) and new device/node SKU configs (H20/H200/GB200).
- Add HF model config JSONs and AICB workload CSV/JSON artifacts; update
.gitignoreto keep required workload data tracked.
Reviewed changes
Copilot reviewed 63 out of 89 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| vidur-alibabacloud/vidur/entities/cluster.py | Adds PD separation logic to compute per-phase world sizes/EP and initialize replicas accordingly. |
| vidur-alibabacloud/vidur/entities/batch.py | Removes a leftover debug assertion comment. |
| vidur-alibabacloud/vidur/config/node_sku_config.py | Introduces an H20 DGX node SKU config. |
| vidur-alibabacloud/vidur/config/model_config.py | Adds Qwen3-Next-80B and Qwen3-MoE-235B model config dataclasses. |
| vidur-alibabacloud/vidur/config/device_sku_config.py | Adds H20/H200/GB200 device SKU configs and updates H800 throughput fields. |
| vidur-alibabacloud/vidur/config/config.py | Extends ReplicaConfig with PD-specific knobs and EP auto-computation/logging. |
| vidur-alibabacloud/data/hf_configs/qwen3-next-80B-A3B_config.json | Adds HF-style config for Qwen3-Next-80B-A3B. |
| vidur-alibabacloud/data/hf_configs/qwen3-next-80B-A3B_Instruct_FP8_config.json | Adds HF-style config for Qwen3-Next-80B-A3B Instruct FP8. |
| vidur-alibabacloud/data/hf_configs/qwen3-8B_config.json | Adds HF-style config for Qwen3-8B. |
| vidur-alibabacloud/data/hf_configs/qwen3-30B-A3B_config.json | Adds HF-style config for Qwen3-MoE 30B-A3B. |
| vidur-alibabacloud/data/hf_configs/qwen3-235B-A22B_config.json | Adds HF-style config for Qwen3-MoE 235B-A22B. |
| vidur-alibabacloud/data/hf_configs/qwen3-235B-A22B_FP8_config.json | Adds FP8 HF-style config variant for Qwen3-MoE 235B-A22B. |
| vidur-alibabacloud/data/hf_configs/deepseek_v3_config.json | Adds HF-style config for DeepSeek V3. |
| vidur-alibabacloud/data/hf_configs/deepseek_R1_0528_config.json | Adds HF-style config for DeepSeek R1 0528. |
| vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv | Adds prefill workload profile for Qwen3-Next-80B. |
| vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv | Adds decode workload profile for Qwen3-Next-80B. |
| vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Moe-235B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv | Adds prefill workload profile for Qwen3-MoE-235B. |
| vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Moe-235B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv | Adds decode workload profile for Qwen3-MoE-235B. |
| vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv | Adds prefill workload profile for DeepSeek-671B. |
| vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv | Adds decode workload profile for DeepSeek-671B. |
| vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs3-seq4096-decode.csv | Adds a smaller CSV decode workload variant for DeepSeek-671B. |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq100-prefill.csv | Adds cached prefill workload profile for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size6-tp1-pp1-ep6-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws6). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size6-tp1-pp1-ep6-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws6). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws32). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws32). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size2-tp1-pp1-ep2-bs1-seq100-prefill.csv | Adds cached prefill workload profile for Qwen3-Next-80B (ws2). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size8-tp4-pp1-ep8-bs1-seq100-prefill.csv | Adds cached prefill workload profile for Qwen3-MoE-235B (ws8,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-MoE-235B (ws32,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq100-prefill.csv | Adds cached prefill workload profile for Qwen3-MoE-235B (ws32,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-MoE-235B (ws32,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size24-tp4-pp1-ep24-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-MoE-235B (ws24,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size24-tp4-pp1-ep24-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-MoE-235B (ws24,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size64-tp8-pp1-ep8-bs1-seq106-decode.csv | Adds cached decode workload profile for DeepSeek-671B (ws64,tp8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size64-tp8-pp1-ep8-bs1-seq100-decode.csv | Adds cached decode workload profile for DeepSeek-671B (ws64,tp8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size48-tp8-pp1-ep48-bs1-seq106-decode.csv | Adds cached decode workload profile for DeepSeek-671B (ws48,tp8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size48-tp8-pp1-ep48-bs1-seq100-decode.csv | Adds cached decode workload profile for DeepSeek-671B (ws48,tp8). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws8-tp1-pp1-ep8-bs1-seq106-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws8-tp1-pp1-ep8-bs1-seq100-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws6-tp1-pp1-ep6-bs1-seq106-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws6). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws6-tp1-pp1-ep6-bs1-seq100-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws6). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws32-tp1-pp1-ep32-bs1-seq106-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws32). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws32-tp1-pp1-ep32-bs1-seq100-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws32). |
| vidur-alibabacloud/.gitignore | Keeps AICB workload artifacts tracked while ignoring other generated CSV/log outputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # 每个 replica 同时处理 prefill 和 decode | ||
| # EP = ws = tp * pp * dp (full cluster world_size) | ||
| # ============================================================ | ||
| if rc.pd_node_ratio == 1: |
| # 2. pd_node_ratio (calculated by ratio) | ||
| # pd_node_ratio (按比例计算) | ||
| # ============================================================ | ||
| elif rc.pd_node_ratio > 0 and rc.pd_node_ratio < 1: |
Comment on lines
160
to
161
| if metrics_config.write_json_trace: | ||
| self._write_cluster_info_to_file() |
Comment on lines
+112
to
+115
| assert num_p > 0 and num_d > 0, ( | ||
| f"[Cluster] _num_prefill_replicas={num_p} 和 " | ||
| f"_num_decode_replicas={num_d} 必须都 > 0, " | ||
| f"来源: {replica_source}") |
Comment on lines
+140
to
+142
| assert rc.prefill_world_size > 0 and rc.decode_world_size > 0, ( | ||
| f"[Cluster] prefill_ws={rc.prefill_world_size} 和 " | ||
| f"decode_ws={rc.decode_world_size} 必须都 > 0") |
Comment on lines
+78
to
+82
| # GB200 NVL72 | ||
| class GB200DeviceSKUConfig(BaseDeviceSKUConfig): | ||
| fp16_tflops: int = 2500 | ||
| fp8_tflops: int = 5000 | ||
| total_memory_gb: int = 192 |
Comment on lines
+486
to
+489
| metadata={"help": "> add: pd_p2p_comm_dtype for pd disaggregation." | ||
| "choices=['fp8', 'float16', 'float32', 'float64', 'bfloat16', 'int8', 'int16', 'int32', 'int64']," | ||
| }, | ||
|
|
Comment on lines
+574
to
+586
| # 打印 ReplicaConfig 配置摘要 | Print ReplicaConfig summary | ||
| logger.info(f"[ReplicaConfig] tp={self.tensor_parallel_size}, pp={self.num_pipeline_stages}, " | ||
| f"per_replica_ws={self.world_size}, ep(temp)={self.expert_model_parallel_size}, " | ||
| f"pd_ratio={self.pd_node_ratio}") | ||
| if self.pd_node_ratio < 1: | ||
| p_tp = self.prefill_tensor_parallel_size or self.tensor_parallel_size | ||
| p_pp = self.prefill_num_pipeline_stages or self.num_pipeline_stages | ||
| d_tp = self.decode_tensor_parallel_size or self.tensor_parallel_size | ||
| d_pp = self.decode_num_pipeline_stages or self.num_pipeline_stages | ||
| logger.info(f"[ReplicaConfig] PD separation enabled: " | ||
| f"prefill(tp={p_tp}, pp={p_pp}), decode(tp={d_tp}, pp={d_pp})") | ||
| if self.num_prefill_replicas is not None: | ||
| logger.info(f"[ReplicaConfig] User specified num_prefill_replicas={self.num_prefill_replicas}") |
|
|
||
|
|
||
| @dataclass | ||
| class Qwen3235BA22BModelConfig(BaseModelConfig): |
Comment on lines
+3
to
+8
| "Qwen2MoeForCausalLM" | ||
| ], | ||
| "attention_dropout": 0.0, | ||
| "bos_token_id": 151643, | ||
| "decoder_sparse_step": 1, | ||
| "eos_token_id": 151643, |
This was referenced Mar 31, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces the GPU Memory Inference Module with PD-separation (Prefill-Decode disaggregation) support for SimAI 1.6, enabling accurate memory simulation for large-scale models including DeepSeek-671B, Qwen3-MoE-235B, and Qwen3-Next-80B. It also includes code quality improvements across the vidur-alibabacloud modules.
摘要
本 PR 为 SimAI 1.6 引入 GPU 内存推理模块,支持 PD 分离(预填充-解码分离调度),实现对 DeepSeek-671B、Qwen3-MoE-235B、Qwen3-Next-80B 等大规模模型的精确内存仿真。同时包含 vidur-alibabacloud 模块的代码质量改进。
Changes / 变更内容
New Features / 新功能
Code Quality / 代码质量
print()with properloggingmodule | 将 print() 替换为 loggingFiles Changed / 文件变更
vidur-alibabacloud/vidur/— Python code (89 files)vidur-alibabacloud/data/— AICB workload data + HF model configsvidur-alibabacloud/examples/— run_scenarios.shvidur-alibabacloud/.gitignore— Updated ignore rulesTesting / 测试
Checklist
Co-authored-by: tianhao909 843101550@qq.com
Co-authored-by: MXtremist 44829997+MXtremist@users.noreply.github.com