Skip to content

fix: get transform timeout precision issue, missing resource destructors#775

Open
maciejmajek wants to merge 4 commits intomainfrom
fix/get-transform
Open

fix: get transform timeout precision issue, missing resource destructors#775
maciejmajek wants to merge 4 commits intomainfrom
fix/get-transform

Conversation

@maciejmajek
Copy link
Member

@maciejmajek maciejmajek commented Mar 20, 2026

Purpose

Minor bug fixes.

Proposed Changes

Service api now has shutdown method.
ROS2Connector shutdowns topic, service, action apis before shutting down the node.
Subscribers and publishers are now properly tracked privately.

Issues

  • Links to relevant issues

Testing

CI

Summary by CodeRabbit

  • Chores

    • Version bumped to 2.10.1
  • Bug Fixes

    • Enhanced resource cleanup and proper lifecycle management for ROS2 services, subscribers, and publishers with comprehensive shutdown sequences
    • Improved timeout precision in ROS2 transform operations for more accurate and reliable timing control
    • Better ROS2 context ownership tracking to prevent resource leaks during cleanup

@maciejmajek
Copy link
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

Walkthrough

This PR updates RAI Core to version 2.10.1 and implements explicit lifecycle cleanup across ROS2 communication APIs. Changes include adding a new shutdown method to the service API, making internal collections private in the topic API, and improving context ownership tracking and shutdown sequence ordering in the base connector.

Changes

Cohort / File(s) Summary
Version Update
src/rai_core/pyproject.toml
Bumps package version from 2.10.0 to 2.10.1.
Service API Cleanup
src/rai_core/rai/communication/ros2/api/service.py
Adds new shutdown() method that destroys all cached ROS2 Service and Client instances.
Topic API Refactoring
src/rai_core/rai/communication/ros2/api/topic.py
Converts public subscriptions and publishers dictionaries to private _subscriptions and _publishers; updates all access patterns; extends shutdown() to destroy subscriptions in addition to publishers.
Base Connector Improvements
src/rai_core/rai/communication/ros2/connectors/base.py
Adds ROS2 context ownership tracking via _owns_rclpy_context flag; refactors get_transform() timeout conversion to use divmod() for proper nanosecond precision; restructures shutdown() sequence to teardown APIs before node destruction and conditionally shutdown ROS2 context based on ownership.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers the main changes and structure, but lacks specific implementation details and test coverage explanation required by the template. Expand Testing section with specific test cases or CI results. Clarify whether issues are linked and add more technical details about the timeout precision fix.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the two main changes: fixing get_transform timeout precision and adding missing resource destructors for proper cleanup.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/get-transform

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use OpenGrep to find security vulnerabilities and bugs across 17+ programming languages.

OpenGrep is compatible with Semgrep configurations. Add an opengrep.yml or semgrep.yml configuration file to your project to enable OpenGrep analysis.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/rai_core/rai/communication/ros2/api/topic.py (1)

73-74: Duplicate initialization of _publishers.

_publishers is already initialized at line 63. The re-initialization at line 74 is redundant.

♻️ Suggested cleanup
-        self._subscriptions: Dict[str, Subscription] = {}
-        self._publishers: Dict[str, Publisher] = {}
+        self._subscriptions: Dict[str, Subscription] = {}

And move line 63's _publishers initialization to this location for consistency with _subscriptions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rai_core/rai/communication/ros2/api/topic.py` around lines 73 - 74, The
constructor currently initializes _publishers twice; remove the redundant
initialization and ensure both member dicts are declared together by moving the
single _publishers initialization next to _subscriptions in the __init__ (so
only _subscriptions: Dict[str, Subscription] = {} and _publishers: Dict[str,
Publisher] = {} exist once each); update any references to those attributes if
needed but do not change their types or names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/rai_core/rai/communication/ros2/api/service.py`:
- Around line 131-135: shutdown() currently iterates _persistent_clients without
holding _persistent_clients_lock, risking concurrent modification; modify
shutdown() to acquire _persistent_clients_lock before iterating and destroying
clients (and similarly ensure _services iteration is safe or protected if a lock
exists), destroy each client and service, then clear the _persistent_clients and
_services dictionaries to avoid double-destroy on repeated shutdown() calls;
reference symbols: shutdown(), _persistent_clients, _persistent_clients_lock,
_services, call_service(), release_client().

In `@src/rai_core/rai/communication/ros2/api/topic.py`:
- Around line 249-251: The file stores a boolean _destroy_subscribers but never
uses it; update shutdown() to respect that flag by only iterating over
self._subscriptions.values() and calling subscription.destroy() when
self._destroy_subscribers is True, i.e., wrap the current loop in an if
self._destroy_subscribers guard (or if you prefer removing the flag, delete the
assignment of _destroy_subscribers in the class constructor and any related
TODO/comment so shutdown() always destroys subscriptions). Ensure you reference
the same attribute name (_destroy_subscribers), the shutdown() method, and the
_subscriptions collection when making the change.

---

Nitpick comments:
In `@src/rai_core/rai/communication/ros2/api/topic.py`:
- Around line 73-74: The constructor currently initializes _publishers twice;
remove the redundant initialization and ensure both member dicts are declared
together by moving the single _publishers initialization next to _subscriptions
in the __init__ (so only _subscriptions: Dict[str, Subscription] = {} and
_publishers: Dict[str, Publisher] = {} exist once each); update any references
to those attributes if needed but do not change their types or names.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b4877c3-8fff-4e41-a74c-2edd5be18451

📥 Commits

Reviewing files that changed from the base of the PR and between 3c66900 and 457c935.

📒 Files selected for processing (4)
  • src/rai_core/pyproject.toml
  • src/rai_core/rai/communication/ros2/api/service.py
  • src/rai_core/rai/communication/ros2/api/topic.py
  • src/rai_core/rai/communication/ros2/connectors/base.py

Comment on lines +249 to +251

for subscription in self._subscriptions.values():
subscription.destroy()
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

rclpy issue 1142 subscriber destruction crash

💡 Result:

rclpy GitHub issue #1142 reports a crash with the error "cannot use Destroyable because destruction was requested" during publishing in launch_pytest setups, related to destroying entities like subscribers or nodes while the executor is spinning. The issue occurs due to a race condition in the rclpy executor where an entity (e.g., subscription) is collected for execution but destroyed before its handler can access it, leading to an InvalidHandle exception from Destroyable-derived classes. This was addressed in PR #1150 ("Fix an inherent race in execution vs. destruction"), merged into the rolling branch on 2023-08-28. The fix catches the InvalidHandle error in executor handlers for Destroyable entities (subscriptions, timers, services, clients), allowing the machinery to continue without executing the destroyed entity. It resolves related issues like #1147 (lost subscriptions after create/destroy cycles). Issue #1142 remains open as of 2025-05-20 (last update), likely for tracking backports or further work. Backports exist for Iron (PR #1256) and requests for Humble (issue #1610, dated 2026-02-14). Similar crashes persist in MultiThreadedExecutor scenarios or non-default callback groups (issue #1206, open). Workarounds mentioned: using unreliable QoS (not always viable), ensuring destruction outside spinning threads, or letting Python GC handle destruction instead of explicit destroy_subscription. The fix is present in ROS 2 Rolling and later stable releases post-2023-08.

Citations:


🏁 Script executed:

cd src/rai_core/rai/communication/ros2/api && cat -n topic.py | sed -n '60,75p'

Repository: RobotecAI/rai

Length of output: 921


🏁 Script executed:

cd src/rai_core/rai/communication/ros2/api && cat -n topic.py | sed -n '240,260p'

Repository: RobotecAI/rai

Length of output: 617


🏁 Script executed:

cd src/rai_core/rai/communication/ros2/api && rg "_destroy_subscribers" -B 2 -A 2

Repository: RobotecAI/rai

Length of output: 339


🏁 Script executed:

cd src/rai_core/rai/communication/ros2/api && rg "destroy_subscribers" -B 3 -A 3

Repository: RobotecAI/rai

Length of output: 710


🏁 Script executed:

cd src/rai_core/rai/communication/ros2/api && rg "_destroy_subscribers" -B 1 -A 1

Repository: RobotecAI/rai

Length of output: 228


🏁 Script executed:

cd src/rai_core/rai/communication/ros2/api && wc -l topic.py

Repository: RobotecAI/rai

Length of output: 69


🏁 Script executed:

cd src/rai_core/rai/communication/ros2/api && cat topic.py | grep -n "_destroy_subscribers"

Repository: RobotecAI/rai

Length of output: 121


Remove the unused _destroy_subscribers flag or apply it in shutdown().

The _destroy_subscribers parameter is stored at line 71 but never used anywhere in the file. The shutdown() method (line 250–251) unconditionally destroys subscriptions, ignoring this flag entirely. The TODO comment (lines 65–69) explains the flag exists to prevent subscriber destruction that "often fails" and crashes nodes (rclpy issue #1142). Either apply the flag in shutdown() to guard the destruction, or remove the unused parameter and flag if final cleanup should always destroy subscriptions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rai_core/rai/communication/ros2/api/topic.py` around lines 249 - 251, The
file stores a boolean _destroy_subscribers but never uses it; update shutdown()
to respect that flag by only iterating over self._subscriptions.values() and
calling subscription.destroy() when self._destroy_subscribers is True, i.e.,
wrap the current loop in an if self._destroy_subscribers guard (or if you prefer
removing the flag, delete the assignment of _destroy_subscribers in the class
constructor and any related TODO/comment so shutdown() always destroys
subscriptions). Ensure you reference the same attribute name
(_destroy_subscribers), the shutdown() method, and the _subscriptions collection
when making the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh oh, currently the logic of receive_message in ROS2BaseConnector is weird, seems like we should reduce it's logic and delegate more responsibility to the ROS2TopicAPI class. That way we could implement the _destroy_subscribers flag (it's idea is to remove the subscriber after a message has been received to minimize throughput, so nothing about the .shutdown() logic).

Please create an issue for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

Let me first inspect the current receive_message implementation before creating the issue.


🏁 Script executed:

#!/bin/bash
# Find the connector base file
fd "base.py" --full-path | grep connectors

# Search for receive_message in the connectors
rg "receive_message" -n -A 20 -B 2 --type py | head -120

Length of output: 10124


🏁 Script executed:

#!/bin/bash
# Get the full receive_message implementation in the connector base
rg "receive_message" -n -A 50 -B 2 src/rai_core/rai/communication/ros2/connectors/base.py

Length of output: 4647


🏁 Script executed:

#!/bin/bash
# Get the rest of receive_message and related callbacks
rg "general_callback|_last_message_callback|last_msg|destroy_subscribers" -n -A 10 -B 2 src/rai_core/rai/communication/ros2/connectors/base.py | head -150

Length of output: 6145


Now I have enough context to create a well-detailed issue.


🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo RobotecAI/rai \
  --title "refactor: delegate receive_message logic to ROS2TopicAPI and implement _destroy_subscribers flag" \
  --body "## Summary

The current implementation of \`receive_message\` in \`ROS2BaseConnector\` ([\`src/rai_core/rai/communication/ros2/connectors/base.py\`](https://github.com/RobotecAI/rai/blob/main/src/rai_core/rai/communication/ros2/connectors/base.py)) contains logic that belongs in \`ROS2TopicAPI\`. Additionally, the \`_destroy_subscribers\` flag stored in \`ROS2TopicAPI.__init__\` is never used.

Backlink: https://github.com/RobotecAI/rai/pull/775#discussion_r2967155436
Requested by: `@maciejmajek`

## Current Issues

### 1. Mixed responsibilities in \`receive_message\`

\`ROS2BaseConnector.receive_message\` currently:
- Maintains its own \`last_msg: Dict[str, T]\` cache at the connector level
- Directly calls \`_topic_api.create_subscriber()\` with internal connector callbacks (\`partial(self.general_callback, source)\`)
- Registers an additional \`_last_message_callback\` via \`self.register_callback()\` to track last messages
- Polls \`self.last_msg\` in a \`while\` loop to wait for a message within the timeout

This logic should live inside \`ROS2TopicAPI\`, keeping the connector as a thin delegation layer.

### 2. Unused \`_destroy_subscribers\` flag

\`ROS2TopicAPI\` accepts and stores a \`destroy_subscribers: bool\` parameter (currently stored as \`self._destroy_subscribers\`) but never references it anywhere in the class. The flag's intended behavior (per the existing TODO comment and class docstring) is to **destroy a subscriber immediately after a message is received**, in order to reduce throughput overhead from keeping idle subscribers alive. This is unrelated to the \`shutdown()\` cleanup path.

## Proposed Changes

- Move subscriber creation, message caching, timeout polling, and optional post-receive subscriber destruction into \`ROS2TopicAPI\` as a \`receive_message\` method.
- Use \`self._destroy_subscribers\` inside that new method to conditionally destroy the subscription after the first message is received.
- Simplify \`ROS2BaseConnector.receive_message\` to delegate directly to \`_topic_api.receive_message(...)\`.
- Ensure compatibility with rclpy issue [`#1142`](https://github.com/ros2/rclpy/issues/1142) (known crash on subscriber destruction while executor is spinning) — the flag should remain \`False\` by default.
" \
  --assignee maciejmajek

Length of output: 43


@maciejmajek, done! I've created the issue at #776.

Here's what it covers based on my inspection of the code:

  1. Mixed responsibilities in receive_message — the connector currently manages its own last_msg cache, directly wires internal callbacks (general_callback, _last_message_callback) into the topic API, and runs its own timeout polling loop. All of this should be encapsulated in ROS2TopicAPI.receive_message().

  2. Unused _destroy_subscribers flag — it's passed into ROS2TopicAPI and stored, but never referenced. Once receive_message moves into ROS2TopicAPI, the flag can be applied there: after a message is received, conditionally call subscription.destroy() to free the subscriber and reduce throughput overhead (while keeping False as the safe default given rclpy #1142).


✏️ Learnings added
Learnt from: maciejmajek
Repo: RobotecAI/rai PR: 775
File: src/rai_core/rai/communication/ros2/api/topic.py:249-251
Timestamp: 2026-03-20T18:09:20.830Z
Learning: In RobotecAI/rai, the `_destroy_subscribers` flag in `ROS2TopicAPI` (`src/rai_core/rai/communication/ros2/api/topic.py`) is intended to destroy a subscriber immediately AFTER a message is received (to reduce throughput overhead), NOT during `shutdown()`. The flag is currently unused and tracked in issue `#776` for a future refactor that will move `receive_message` logic from `ROS2BaseConnector` into `ROS2TopicAPI`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: maciejmajek
Repo: RobotecAI/rai PR: 379
File: tests/communication/ros2/test_connectors.py:50-50
Timestamp: 2025-01-21T11:24:27.687Z
Learning: In ROS2 connector implementation, topics are typically static and well-discovered before running any method of the API. The discovery mechanism in tests is an edge case, as it's not representative of the typical usage pattern.

Learnt from: maciejmajek
Repo: RobotecAI/rai PR: 383
File: tests/tools/ros2/test_topic_tools.py:84-105
Timestamp: 2025-01-24T13:42:58.814Z
Learning: The ROS2 node discovery in the test suite is asynchronous, and the current implementation doesn't wait for topic discovery. This is a known limitation that causes tests like `test_receive_message_tool` and `test_receive_image_tool` to be marked as `xfail`. A proper discovery mechanism with timeout will be implemented in the future to ensure reliable topic communication.

Learnt from: boczekbartek
Repo: RobotecAI/rai PR: 335
File: src/rai/rai/tools/ros/native_actions.py:0-0
Timestamp: 2025-01-08T13:36:27.469Z
Learning: In the RAI codebase, ROS2 messages are converted to string representation to make them more LLM-friendly, as seen in the `GetMsgFromTopic._run` method. This design decision helps in better processing and understanding of the messages by the language model.

Learnt from: maciejmajek
Repo: RobotecAI/rai PR: 367
File: src/rai/rai/communication/ros2/api.py:370-373
Timestamp: 2025-01-16T16:40:58.528Z
Learning: The ROS2 high-level API in `src/rai/rai/communication/ros2/api.py` is intentionally designed to be synchronous for simpler usage, without requiring callback handling.

@codecov
Copy link

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.16%. Comparing base (3c66900) to head (75b2fdf).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/rai_core/rai/communication/ros2/api/topic.py 75.00% 2 Missing ⚠️
...rai_core/rai/communication/ros2/connectors/base.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #775      +/-   ##
==========================================
+ Coverage   73.11%   73.16%   +0.04%     
==========================================
  Files          82       82              
  Lines        3571     3585      +14     
==========================================
+ Hits         2611     2623      +12     
- Misses        960      962       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

maciejmajek and others added 2 commits March 20, 2026 19:04
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant