feat(deployment): centerpoint deployment integration#181
feat(deployment): centerpoint deployment integration#181vividf wants to merge 39 commits intotier4:feat/new_deployment_and_evaluation_pipelinefrom
Conversation
bfb778f to
441d06e
Compare
| verification = dict( | ||
| enabled=False, | ||
| tolerance=1e-1, | ||
| tolerance=1, |
There was a problem hiding this comment.
Explain what is tolerance here, and why updating from 0.1 to 1
There was a problem hiding this comment.
The value was originally set for calibration classification and later copied to CenterPoint, but it does not work correctly for CenterPoint.
INFO:deployment.core.evaluation.verification_mixin: tensorrt (cuda:0) latency: 205.08 ms
INFO:deployment.core.evaluation.verification_mixin: output[heatmap]: shape=(1, 5, 510, 510), max_diff=0.070197, mean_diff=0.007674
INFO:deployment.core.evaluation.verification_mixin: output[reg]: shape=(1, 2, 510, 510), max_diff=0.007944, mean_diff=0.001120
INFO:deployment.core.evaluation.verification_mixin: output[height]: shape=(1, 1, 510, 510), max_diff=0.025401, mean_diff=0.002122
INFO:deployment.core.evaluation.verification_mixin: output[dim]: shape=(1, 3, 510, 510), max_diff=0.031920, mean_diff=0.001143
INFO:deployment.core.evaluation.verification_mixin: output[rot]: shape=(1, 2, 510, 510), max_diff=0.075215, mean_diff=0.004582
INFO:deployment.core.evaluation.verification_mixin: output[vel]: shape=(1, 2, 510, 510), max_diff=0.221999, mean_diff=0.004940
INFO:deployment.core.evaluation.verification_mixin:
Overall Max difference: 0.221999
INFO:deployment.core.evaluation.verification_mixin: Overall Mean difference: 0.004347
WARNING:deployment.core.evaluation.verification_mixin: tensorrt (cuda:0) verification FAILED ✗ (max diff: 0.221999 > tolerance: 0.100000)
INFO:deployment.core.evaluation.verification_mixin:
There was a problem hiding this comment.
Do you know any reason why it fail? Since it seems like a verification, it's always better to check the reason rather than update the tolerance
There was a problem hiding this comment.
It doesn't necessarily indicate a failure.
When converting from PyTorch to TensorRT, some numerical differences are expected due to different kernels, precision handling, and TensorRT optimizations.
The verification is mainly used as a safeguard to detect major issues (e.g., incorrect conversion settings) rather than to enforce exact numerical equivalence.
There was a problem hiding this comment.
Since 1e-1 is when we set for resnet18 for calibration classification, it is different in the cases.
There was a problem hiding this comment.
Anyway, 5e-1 can be a better value
There was a problem hiding this comment.
Running onnx (cuda:0) reference...
2026-03-10 15:20:07.511273431 [V:onnxruntime:, execution_steps.cc:103 Execute] stream 0 activate notification with index 0
2026-03-10 15:20:07.567219724 [V:onnxruntime:, execution_steps.cc:47 Execute] stream 0 wait on Notification with id: 0
INFO:deployment.core.evaluation.verification_mixin: onnx (cuda:0) latency: 1423.80 ms
INFO:deployment.core.evaluation.verification_mixin:
Running tensorrt (cuda:0) test...
INFO:deployment.core.evaluation.verification_mixin: tensorrt (cuda:0) latency: 1141.26 ms
INFO:deployment.core.evaluation.verification_mixin: output[heatmap]: shape=(1, 5, 510, 510), max_diff=0.464849, mean_diff=0.056135
INFO:deployment.core.evaluation.verification_mixin: output[reg]: shape=(1, 2, 510, 510), max_diff=0.056639, mean_diff=0.006198
INFO:deployment.core.evaluation.verification_mixin: output[height]: shape=(1, 1, 510, 510), max_diff=0.227012, mean_diff=0.065522
INFO:deployment.core.evaluation.verification_mixin: output[dim]: shape=(1, 3, 510, 510), max_diff=0.336713, mean_diff=0.028087
INFO:deployment.core.evaluation.verification_mixin: output[rot]: shape=(1, 2, 510, 510), max_diff=0.515039, mean_diff=0.023962
INFO:deployment.core.evaluation.verification_mixin: output[vel]: shape=(1, 2, 510, 510), max_diff=0.932002, mean_diff=0.034206
INFO:deployment.core.evaluation.verification_mixin:
Overall Max difference: 0.932002
INFO:deployment.core.evaluation.verification_mixin: Overall Mean difference: 0.037279
WARNING:deployment.core.evaluation.verification_mixin: tensorrt (cuda:0) verification FAILED ✗ (max diff: 0.932002 > tolerance: 0.500000)
On a different computer, it can have different values.
I will leave 1 for now
There was a problem hiding this comment.
Did you set any random seed to set this validation since the randomness (for example, shuffling pointclouds) significantly affects the results. Otherwise, i believe the difference between computer is too huge
There was a problem hiding this comment.
Note that the reported difference corresponds to the maximum deviation; the mean difference is actually quite small.
Additionally, the magnitude of the difference depends heavily on the hardware. For example, on Blackwell GPUs (ONNX CUDA vs. TensorRT), the discrepancy is minimal. In contrast, on my laptop, the difference between ONNX CUDA and TensorRT is around 1. Even when forcing ONNX Runtime to use CUDA only, it still initializes a default CPU executor and executes some operations on the CPU, which can introduce discrepancies.
Interestingly, when comparing ONNX CPU with TensorRT on my laptop, the difference becomes very small. However, on Blackwell, the ONNX CPU vs. TensorRT comparison shows a larger gap.
There was a problem hiding this comment.
Let's put a TODO here to investigate the issue in another PR.
caa92a6 to
93e5558
Compare
de7020e to
6470ac5
Compare
|
Some of the modules, for example, |
| model_cfg = Config.fromfile(args.model_cfg) | ||
| config = BaseDeploymentConfig(deploy_cfg) | ||
|
|
||
| _validate_required_components(config.components_cfg) |
There was a problem hiding this comment.
move _validate_required_components to BaseDeploymentConfig
There was a problem hiding this comment.
This only validates the needed name for Centerpoint
There was a problem hiding this comment.
Why it only validates needed names for CenterPoint? It should be a function to validate needed names for different models right?
5256306 to
2b28f60
Compare
1ca0e1c to
a6b9840
Compare
a209d2b to
90d1404
Compare
|
@KSeangTan
Regarding this, I would like to change those names that can be reused for bevfusion in other PR |
23afb89 to
a67a3e5
Compare
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
…erpoint Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
feffe48 to
15ba5d4
Compare
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
KSeangTan
left a comment
There was a problem hiding this comment.
Thanks for the work, and please address the comments accordingly.
| verification = dict( | ||
| enabled=False, | ||
| tolerance=1e-1, | ||
| tolerance=1, |
There was a problem hiding this comment.
Let's put a TODO here to investigate the issue in another PR.
| model_cfg: Config, | ||
| metrics_config: Detection3DMetricsConfig, | ||
| components_cfg: ComponentsConfig, | ||
| ): |
|
|
||
| self._components_cfg = components_cfg | ||
|
|
||
| task_profile = TaskProfile( |
There was a problem hiding this comment.
Try to avoid magic string
|
|
||
| input_features, voxel_dict = model._extract_features(data_loader, sample_idx) | ||
|
|
||
| if not isinstance(input_features, torch.Tensor): |
There was a problem hiding this comment.
is it possible that input_features are not torch.Tensor? Otherwise, please consider to use assert
| Raises: | ||
| ValueError: If class_names not found in pytorch_model.cfg. | ||
| """ | ||
| cfg = getattr(pytorch_model, "cfg", None) |
There was a problem hiding this comment.
Why do we use getattr? We can simply call pytorch_model.cfg right?
| device=device, | ||
| ) | ||
|
|
||
| self.num_classes: int = len(class_names) |
There was a problem hiding this comment.
Can remove int if the typing hint is clear
|
|
||
| # Select execution providers based on device | ||
| providers = self.device.to_ort_provider() | ||
| if self.device.is_cuda: |
There was a problem hiding this comment.
device_message = "CUDA" self.device.is_cuda else "CPU"
logger.info(f"Using {device_message} execution provider for ONNX")
This is cleaner.
| model_cfg = Config.fromfile(args.model_cfg) | ||
| config = BaseDeploymentConfig(deploy_cfg) | ||
|
|
||
| _validate_required_components(config.components_cfg) |
There was a problem hiding this comment.
Why it only validates needed names for CenterPoint? It should be a function to validate needed names for different models right?
Summary
Integrates CenterPoint into the unified deployment framework, enabling deployment and evaluation of ONNX and TensorRT models.
Note, this PR include changes in #180
Changes
projects/CenterPointtodeployment/projects/centerpointdeploy.pyscript with new unified CLI (deployment.cli.main)Migration Notes
projects/CenterPoint/scripts/deploy.py) is removedpython -m deployment.cli.main centerpoint <deploy_config> <model_config>deployment.projects.centerpoint.onnx_modelsHow to run
Exported ONNX (Same)
Voxel Encoder

Backbone Head
