fix: get transform timeout precision issue, missing resource destructors#775
fix: get transform timeout precision issue, missing resource destructors#775maciejmajek wants to merge 4 commits intomainfrom
Conversation
fix: get transform timeout precision
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
WalkthroughThis PR updates RAI Core to version 2.10.1 and implements explicit lifecycle cleanup across ROS2 communication APIs. Changes include adding a new shutdown method to the service API, making internal collections private in the topic API, and improving context ownership tracking and shutdown sequence ordering in the base connector. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.OpenGrep is compatible with Semgrep configurations. Add an |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/rai_core/rai/communication/ros2/api/topic.py (1)
73-74: Duplicate initialization of_publishers.
_publishersis already initialized at line 63. The re-initialization at line 74 is redundant.♻️ Suggested cleanup
- self._subscriptions: Dict[str, Subscription] = {} - self._publishers: Dict[str, Publisher] = {} + self._subscriptions: Dict[str, Subscription] = {}And move line 63's
_publishersinitialization to this location for consistency with_subscriptions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/rai_core/rai/communication/ros2/api/topic.py` around lines 73 - 74, The constructor currently initializes _publishers twice; remove the redundant initialization and ensure both member dicts are declared together by moving the single _publishers initialization next to _subscriptions in the __init__ (so only _subscriptions: Dict[str, Subscription] = {} and _publishers: Dict[str, Publisher] = {} exist once each); update any references to those attributes if needed but do not change their types or names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/rai_core/rai/communication/ros2/api/service.py`:
- Around line 131-135: shutdown() currently iterates _persistent_clients without
holding _persistent_clients_lock, risking concurrent modification; modify
shutdown() to acquire _persistent_clients_lock before iterating and destroying
clients (and similarly ensure _services iteration is safe or protected if a lock
exists), destroy each client and service, then clear the _persistent_clients and
_services dictionaries to avoid double-destroy on repeated shutdown() calls;
reference symbols: shutdown(), _persistent_clients, _persistent_clients_lock,
_services, call_service(), release_client().
In `@src/rai_core/rai/communication/ros2/api/topic.py`:
- Around line 249-251: The file stores a boolean _destroy_subscribers but never
uses it; update shutdown() to respect that flag by only iterating over
self._subscriptions.values() and calling subscription.destroy() when
self._destroy_subscribers is True, i.e., wrap the current loop in an if
self._destroy_subscribers guard (or if you prefer removing the flag, delete the
assignment of _destroy_subscribers in the class constructor and any related
TODO/comment so shutdown() always destroys subscriptions). Ensure you reference
the same attribute name (_destroy_subscribers), the shutdown() method, and the
_subscriptions collection when making the change.
---
Nitpick comments:
In `@src/rai_core/rai/communication/ros2/api/topic.py`:
- Around line 73-74: The constructor currently initializes _publishers twice;
remove the redundant initialization and ensure both member dicts are declared
together by moving the single _publishers initialization next to _subscriptions
in the __init__ (so only _subscriptions: Dict[str, Subscription] = {} and
_publishers: Dict[str, Publisher] = {} exist once each); update any references
to those attributes if needed but do not change their types or names.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8b4877c3-8fff-4e41-a74c-2edd5be18451
📒 Files selected for processing (4)
src/rai_core/pyproject.tomlsrc/rai_core/rai/communication/ros2/api/service.pysrc/rai_core/rai/communication/ros2/api/topic.pysrc/rai_core/rai/communication/ros2/connectors/base.py
|
|
||
| for subscription in self._subscriptions.values(): | ||
| subscription.destroy() |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
rclpy issue 1142 subscriber destruction crash
💡 Result:
rclpy GitHub issue #1142 reports a crash with the error "cannot use Destroyable because destruction was requested" during publishing in launch_pytest setups, related to destroying entities like subscribers or nodes while the executor is spinning. The issue occurs due to a race condition in the rclpy executor where an entity (e.g., subscription) is collected for execution but destroyed before its handler can access it, leading to an InvalidHandle exception from Destroyable-derived classes. This was addressed in PR #1150 ("Fix an inherent race in execution vs. destruction"), merged into the rolling branch on 2023-08-28. The fix catches the InvalidHandle error in executor handlers for Destroyable entities (subscriptions, timers, services, clients), allowing the machinery to continue without executing the destroyed entity. It resolves related issues like #1147 (lost subscriptions after create/destroy cycles). Issue #1142 remains open as of 2025-05-20 (last update), likely for tracking backports or further work. Backports exist for Iron (PR #1256) and requests for Humble (issue #1610, dated 2026-02-14). Similar crashes persist in MultiThreadedExecutor scenarios or non-default callback groups (issue #1206, open). Workarounds mentioned: using unreliable QoS (not always viable), ensuring destruction outside spinning threads, or letting Python GC handle destruction instead of explicit destroy_subscription. The fix is present in ROS 2 Rolling and later stable releases post-2023-08.
Citations:
- 1: [launch_pytest] publishing error: cannot use Destroyable because destruction was requested ros2/rclpy#1142
- 2: Fix an inherent race in execution vs. destruction. ros2/rclpy#1150
- 3: lost subscription after repeated create/destroy cycles ros2/rclpy#1147
- 4: MultiThreadExecutor may throw exception after destroying subscription with non-default callback group ros2/rclpy#1206
🏁 Script executed:
cd src/rai_core/rai/communication/ros2/api && cat -n topic.py | sed -n '60,75p'Repository: RobotecAI/rai
Length of output: 921
🏁 Script executed:
cd src/rai_core/rai/communication/ros2/api && cat -n topic.py | sed -n '240,260p'Repository: RobotecAI/rai
Length of output: 617
🏁 Script executed:
cd src/rai_core/rai/communication/ros2/api && rg "_destroy_subscribers" -B 2 -A 2Repository: RobotecAI/rai
Length of output: 339
🏁 Script executed:
cd src/rai_core/rai/communication/ros2/api && rg "destroy_subscribers" -B 3 -A 3Repository: RobotecAI/rai
Length of output: 710
🏁 Script executed:
cd src/rai_core/rai/communication/ros2/api && rg "_destroy_subscribers" -B 1 -A 1Repository: RobotecAI/rai
Length of output: 228
🏁 Script executed:
cd src/rai_core/rai/communication/ros2/api && wc -l topic.pyRepository: RobotecAI/rai
Length of output: 69
🏁 Script executed:
cd src/rai_core/rai/communication/ros2/api && cat topic.py | grep -n "_destroy_subscribers"Repository: RobotecAI/rai
Length of output: 121
Remove the unused _destroy_subscribers flag or apply it in shutdown().
The _destroy_subscribers parameter is stored at line 71 but never used anywhere in the file. The shutdown() method (line 250–251) unconditionally destroys subscriptions, ignoring this flag entirely. The TODO comment (lines 65–69) explains the flag exists to prevent subscriber destruction that "often fails" and crashes nodes (rclpy issue #1142). Either apply the flag in shutdown() to guard the destruction, or remove the unused parameter and flag if final cleanup should always destroy subscriptions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/rai_core/rai/communication/ros2/api/topic.py` around lines 249 - 251, The
file stores a boolean _destroy_subscribers but never uses it; update shutdown()
to respect that flag by only iterating over self._subscriptions.values() and
calling subscription.destroy() when self._destroy_subscribers is True, i.e.,
wrap the current loop in an if self._destroy_subscribers guard (or if you prefer
removing the flag, delete the assignment of _destroy_subscribers in the class
constructor and any related TODO/comment so shutdown() always destroys
subscriptions). Ensure you reference the same attribute name
(_destroy_subscribers), the shutdown() method, and the _subscriptions collection
when making the change.
There was a problem hiding this comment.
Uh oh, currently the logic of receive_message in ROS2BaseConnector is weird, seems like we should reduce it's logic and delegate more responsibility to the ROS2TopicAPI class. That way we could implement the _destroy_subscribers flag (it's idea is to remove the subscriber after a message has been received to minimize throughput, so nothing about the .shutdown() logic).
Please create an issue for that.
There was a problem hiding this comment.
🧩 Analysis chain
Let me first inspect the current receive_message implementation before creating the issue.
🏁 Script executed:
#!/bin/bash
# Find the connector base file
fd "base.py" --full-path | grep connectors
# Search for receive_message in the connectors
rg "receive_message" -n -A 20 -B 2 --type py | head -120
Length of output: 10124
🏁 Script executed:
#!/bin/bash
# Get the full receive_message implementation in the connector base
rg "receive_message" -n -A 50 -B 2 src/rai_core/rai/communication/ros2/connectors/base.py
Length of output: 4647
🏁 Script executed:
#!/bin/bash
# Get the rest of receive_message and related callbacks
rg "general_callback|_last_message_callback|last_msg|destroy_subscribers" -n -A 10 -B 2 src/rai_core/rai/communication/ros2/connectors/base.py | head -150
Length of output: 6145
Now I have enough context to create a well-detailed issue.
🏁 Script executed:
#!/bin/bash
gh issue create \
--repo RobotecAI/rai \
--title "refactor: delegate receive_message logic to ROS2TopicAPI and implement _destroy_subscribers flag" \
--body "## Summary
The current implementation of \`receive_message\` in \`ROS2BaseConnector\` ([\`src/rai_core/rai/communication/ros2/connectors/base.py\`](https://github.com/RobotecAI/rai/blob/main/src/rai_core/rai/communication/ros2/connectors/base.py)) contains logic that belongs in \`ROS2TopicAPI\`. Additionally, the \`_destroy_subscribers\` flag stored in \`ROS2TopicAPI.__init__\` is never used.
Backlink: https://github.com/RobotecAI/rai/pull/775#discussion_r2967155436
Requested by: `@maciejmajek`
## Current Issues
### 1. Mixed responsibilities in \`receive_message\`
\`ROS2BaseConnector.receive_message\` currently:
- Maintains its own \`last_msg: Dict[str, T]\` cache at the connector level
- Directly calls \`_topic_api.create_subscriber()\` with internal connector callbacks (\`partial(self.general_callback, source)\`)
- Registers an additional \`_last_message_callback\` via \`self.register_callback()\` to track last messages
- Polls \`self.last_msg\` in a \`while\` loop to wait for a message within the timeout
This logic should live inside \`ROS2TopicAPI\`, keeping the connector as a thin delegation layer.
### 2. Unused \`_destroy_subscribers\` flag
\`ROS2TopicAPI\` accepts and stores a \`destroy_subscribers: bool\` parameter (currently stored as \`self._destroy_subscribers\`) but never references it anywhere in the class. The flag's intended behavior (per the existing TODO comment and class docstring) is to **destroy a subscriber immediately after a message is received**, in order to reduce throughput overhead from keeping idle subscribers alive. This is unrelated to the \`shutdown()\` cleanup path.
## Proposed Changes
- Move subscriber creation, message caching, timeout polling, and optional post-receive subscriber destruction into \`ROS2TopicAPI\` as a \`receive_message\` method.
- Use \`self._destroy_subscribers\` inside that new method to conditionally destroy the subscription after the first message is received.
- Simplify \`ROS2BaseConnector.receive_message\` to delegate directly to \`_topic_api.receive_message(...)\`.
- Ensure compatibility with rclpy issue [`#1142`](https://github.com/ros2/rclpy/issues/1142) (known crash on subscriber destruction while executor is spinning) — the flag should remain \`False\` by default.
" \
--assignee maciejmajek
Length of output: 43
@maciejmajek, done! I've created the issue at #776.
Here's what it covers based on my inspection of the code:
-
Mixed responsibilities in
receive_message— the connector currently manages its ownlast_msgcache, directly wires internal callbacks (general_callback,_last_message_callback) into the topic API, and runs its own timeout polling loop. All of this should be encapsulated inROS2TopicAPI.receive_message(). -
Unused
_destroy_subscribersflag — it's passed intoROS2TopicAPIand stored, but never referenced. Oncereceive_messagemoves intoROS2TopicAPI, the flag can be applied there: after a message is received, conditionally callsubscription.destroy()to free the subscriber and reduce throughput overhead (while keepingFalseas the safe default given rclpy#1142).
✏️ Learnings added
Learnt from: maciejmajek
Repo: RobotecAI/rai PR: 775
File: src/rai_core/rai/communication/ros2/api/topic.py:249-251
Timestamp: 2026-03-20T18:09:20.830Z
Learning: In RobotecAI/rai, the `_destroy_subscribers` flag in `ROS2TopicAPI` (`src/rai_core/rai/communication/ros2/api/topic.py`) is intended to destroy a subscriber immediately AFTER a message is received (to reduce throughput overhead), NOT during `shutdown()`. The flag is currently unused and tracked in issue `#776` for a future refactor that will move `receive_message` logic from `ROS2BaseConnector` into `ROS2TopicAPI`.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: maciejmajek
Repo: RobotecAI/rai PR: 379
File: tests/communication/ros2/test_connectors.py:50-50
Timestamp: 2025-01-21T11:24:27.687Z
Learning: In ROS2 connector implementation, topics are typically static and well-discovered before running any method of the API. The discovery mechanism in tests is an edge case, as it's not representative of the typical usage pattern.
Learnt from: maciejmajek
Repo: RobotecAI/rai PR: 383
File: tests/tools/ros2/test_topic_tools.py:84-105
Timestamp: 2025-01-24T13:42:58.814Z
Learning: The ROS2 node discovery in the test suite is asynchronous, and the current implementation doesn't wait for topic discovery. This is a known limitation that causes tests like `test_receive_message_tool` and `test_receive_image_tool` to be marked as `xfail`. A proper discovery mechanism with timeout will be implemented in the future to ensure reliable topic communication.
Learnt from: boczekbartek
Repo: RobotecAI/rai PR: 335
File: src/rai/rai/tools/ros/native_actions.py:0-0
Timestamp: 2025-01-08T13:36:27.469Z
Learning: In the RAI codebase, ROS2 messages are converted to string representation to make them more LLM-friendly, as seen in the `GetMsgFromTopic._run` method. This design decision helps in better processing and understanding of the messages by the language model.
Learnt from: maciejmajek
Repo: RobotecAI/rai PR: 367
File: src/rai/rai/communication/ros2/api.py:370-373
Timestamp: 2025-01-16T16:40:58.528Z
Learning: The ROS2 high-level API in `src/rai/rai/communication/ros2/api.py` is intentionally designed to be synchronous for simpler usage, without requiring callback handling.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #775 +/- ##
==========================================
+ Coverage 73.11% 73.16% +0.04%
==========================================
Files 82 82
Lines 3571 3585 +14
==========================================
+ Hits 2611 2623 +12
- Misses 960 962 +2 ☔ View full report in Codecov by Sentry. |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Purpose
Minor bug fixes.
Proposed Changes
Service api now has shutdown method.
ROS2Connector shutdowns topic, service, action apis before shutting down the node.
Subscribers and publishers are now properly tracked privately.
Issues
Testing
CI
Summary by CodeRabbit
Chores
Bug Fixes