Skip to content

Comments

[Bugfix] NeighborPriorityQueue panics, when it is at capacity and NaN distance is inserted#796

Open
arrayka wants to merge 3 commits intomainfrom
u/alrazu/priority_queue_nan
Open

[Bugfix] NeighborPriorityQueue panics, when it is at capacity and NaN distance is inserted#796
arrayka wants to merge 3 commits intomainfrom
u/alrazu/priority_queue_nan

Conversation

@arrayka
Copy link
Contributor

@arrayka arrayka commented Feb 23, 2026

Why

When the inserted distance is not comparable with existing distances (neither larger nor smaller), the insert_idx is incorrectly calculated (it is set to self.size). This typically happens when inserting a neighbor with NaN distance.

This causes NeighborPriorityQueue to panic with this error, when the queue is already full:

thread 'main' panicked at src\index\neighbor\neighbor_priority_queue.rs:125:26:
insertion index (is 65) should be <= len (is 64)

End-to-end repro: see vectors_with_large_values_should_be_inserted_and_searched_without_panic() test in diskann_async.rs.

What

This PR changes the behavior of NeighborPriorityQueue to ignore an incomparable value (NaN or similar) when the queue is full and we can't determine where it belongs in the sorted order.

Note: the alternative approach would be to insert a NaN value at the end of the queue, but it would overwrite legit non-NaN value, which is not desirable.

The bug was discovered when we encountered large, non‑normalized vectors in a random dataset used during one of our experiments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where NeighborPriorityQueue panics when attempting to insert a neighbor with a NaN (Not-a-Number) distance into a full fixed-size queue. The panic occurred because NaN values are incomparable, causing the insertion index calculation to incorrectly return a value equal to the queue's capacity, leading to an out-of-bounds insertion attempt.

Changes:

  • Added NaN handling logic in NeighborPriorityQueue::insert to detect and ignore incomparable values when the queue is full
  • Added comprehensive unit tests covering NaN, infinity, and mixed edge cases
  • Added an integration test to validate end-to-end behavior with vectors containing infinity values

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
diskann/src/neighbor/queue.rs Added check to detect when insert_idx == capacity (indicating NaN/incomparable value) and ignore the insertion; added 6 unit tests covering NaN, infinity, and mixed scenarios
diskann-providers/src/index/diskann_async.rs Added integration test inserting and searching vectors with infinity values to ensure no panics occur in realistic scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arrayka
Copy link
Contributor Author

arrayka commented Feb 23, 2026

Pending benchmark validation...

0
};

if self.size == self.capacity {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about inserting

if nbr.distance.is_nan() {
    return;
}

as the very first instruction to prevent Nans from showing up at all? (this is a pretty cheap operation on x86).

Or more directly - if NANs get in before the queue is at capacity, does it correctly recover if non-NAN distances are added? If not, we should bail unconditionally on NANs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A priority queue is incompatible fundamentally with NaNs, unless we are abusing them for some other purpose. I think @hildebrandmw's suggestion is the most reasonable solution.

@metajack
Copy link

This PR changes the behavior of NeighborPriorityQueue to ignore an incomparable value (NaN or similar) when the queue is full and we can't determine where it belongs in the sorted order.

You can never sort a NaN. All NaNs should be ignored as if the queue was full and it sorted to the end. Probably there should be a debug_assert! to detect this case and error if we see it in testing.

Also, didn't you fix this exact problem like 3-4 months ago? I remember reviewing this.

@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2026

Codecov Report

❌ Patch coverage is 98.26087% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.49%. Comparing base (d1e5675) to head (9d8bbcb).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
diskann-providers/src/index/diskann_async.rs 97.10% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #796      +/-   ##
==========================================
+ Coverage   89.27%   89.49%   +0.21%     
==========================================
  Files         431      432       +1     
  Lines       78942    79482     +540     
==========================================
+ Hits        70479    71133     +654     
+ Misses       8463     8349     -114     
Flag Coverage Δ
miri 89.49% <98.26%> (+0.21%) ⬆️
unittests 89.34% <98.26%> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann/src/neighbor/queue.rs 98.38% <100.00%> (+0.10%) ⬆️
diskann-providers/src/index/diskann_async.rs 96.37% <97.10%> (+0.02%) ⬆️

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants