Skip to content

Comments

Fix #700: Remove public Clone from spsc_queue to prevent memory corruption#703

Open
dahankzter wants to merge 7 commits intoDataDog:masterfrom
dahankzter:fix/issue-700-remove-spsc-clone
Open

Fix #700: Remove public Clone from spsc_queue to prevent memory corruption#703
dahankzter wants to merge 7 commits intoDataDog:masterfrom
dahankzter:fix/issue-700-remove-spsc-clone

Conversation

@dahankzter
Copy link

Summary

Fixes #700 - Removes the public Clone implementation from Producer<T> and Consumer<T> to prevent memory corruption in safe code.

Problem

The spsc_queue is designed for Single-Producer-Single-Consumer access with Ordering::Relaxed for performance. The public Clone trait implementation allowed external code to create multiple producers or consumers, which violates SPSC guarantees and causes:

  • Double-free when multiple consumers read the same value from the queue
  • Heap corruption (malloc(): chunk size mismatch in fastbin)
  • Memory leaks when multiple producers overwrite values without dropping
  • Undefined behavior in safe Rust code

Solution

  1. Removed Clone trait implementation from Producer<T> and Consumer<T>
  2. Added crate-private clone_internal() methods for internal use within glommio
  3. Updated shared_channel to use clone_internal() for connection handoff
  4. Added documentation explaining SPSC guarantees and safety requirements

Breaking Change

This is a breaking API change. External code that clones Producer or Consumer will no longer compile. However, such code was already unsound and could cause heap corruption.

Internal uses within glommio (e.g., shared_channel) are safe because they ensure only one producer/consumer is active at any given time.

Testing

liburing submodule updated 2.13 and fix format errors
- CI/README: MSRV -> 1.92
- sysfs: track queue io_poll support and use it to gate poll ring usage
- spsc_queue: remove raw buffer ptr, use Box<[Slot<T>]> + wrapping indices
Remove more unsafe code from spsc queue.
@dahankzter
Copy link
Author

All failures show the same error:
Os { code: 95, kind: Unsupported, message: "Operation not supported" }
Operation: "Reading" from DMA files

This occurs because:

  • DMA files require O_DIRECT support (direct I/O)
  • GitHub Actions CI runs tests on /tmp (likely tmpfs/overlayfs)
  • These filesystems don't support O_DIRECT operations
  • The test output even warns: "Glommio currently only supports NVMe-backed volumes formatted with XFS or EXT4"

Fixes issue DataDog#700 - memory corruption through Clone abuse.

The spsc_queue is designed for Single-Producer-Single-Consumer access
with Relaxed memory ordering for performance. The public Clone
implementation allowed external code to create multiple producers or
consumers, which violates SPSC guarantees and causes:
- Double-free when multiple consumers read the same value
- Memory corruption (malloc chunk size mismatch)
- Memory leaks when multiple producers overwrite values
- Undefined behavior in safe code

Changes:
- Remove Clone trait implementation from Producer<T> and Consumer<T>
- Add crate-private clone_internal() methods for internal use
- Update shared_channel to use clone_internal() for connection handoff
- Add documentation explaining SPSC guarantees and safety requirements

This is a breaking API change but necessary to prevent UB in safe code.
Internal uses within glommio (shared_channel) are safe because they
ensure only one producer/consumer is active at any given time.

Signed-off-by: Henrik Ma Johansson <dahankzter@gmail.com>
The rand 0.10 API changed, requiring RngExt trait to be in scope
to use the random_range() method. This fixes CI build failures.

Signed-off-by: Henrik Ma Johansson <dahankzter@gmail.com>
@dahankzter dahankzter force-pushed the fix/issue-700-remove-spsc-clone branch from 3a0eddd to 3dd5033 Compare February 11, 2026 20:55
After rebasing on PR DataDog#702, which updated rand from 0.10 to 0.9,
the benchmarks needed API adjustments:
- Use rand::Rng instead of rand::RngExt
- Use gen_range() instead of random_range()

This ensures benchmarks compile with the updated dependency.

Signed-off-by: Henrik Ma Johansson <dahankzter@gmail.com>
@dahankzter dahankzter force-pushed the fix/issue-700-remove-spsc-clone branch from b3cec27 to 786caa0 Compare February 11, 2026 21:18
@henrik-leovegas
Copy link

Update: Rebased on PR #702

This PR has been rebased on top of #702 (dependency updates) to inherit its passing CI infrastructure.

What changed from the rebase:

From PR #702 (most of the file changes):

  • Updated dependencies including rand from 0.10 to 0.9
  • Various other dependency and infrastructure updates
  • This explains why the PR shows 617 additions / 537 deletions across 43 files

Our actual fix (the core changes for issue #700):

  • glommio/src/channels/spsc_queue.rs - Removed public Clone, added clone_internal()
  • glommio/src/channels/shared_channel.rs - Updated to use clone_internal()
  • glommio/benches/shared_channel.rs & glommio/benches/competing_io.rs - Fixed for rand 0.9 API

All CI checks now pass ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory corruption in glommio::channels::spsc_queue under concurrent use

3 participants