[Bugfix] NeighborPriorityQueue panics, when it is at capacity and NaN distance is inserted#796
[Bugfix] NeighborPriorityQueue panics, when it is at capacity and NaN distance is inserted#796
Conversation
… distance is inserted
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug where NeighborPriorityQueue panics when attempting to insert a neighbor with a NaN (Not-a-Number) distance into a full fixed-size queue. The panic occurred because NaN values are incomparable, causing the insertion index calculation to incorrectly return a value equal to the queue's capacity, leading to an out-of-bounds insertion attempt.
Changes:
- Added NaN handling logic in
NeighborPriorityQueue::insertto detect and ignore incomparable values when the queue is full - Added comprehensive unit tests covering NaN, infinity, and mixed edge cases
- Added an integration test to validate end-to-end behavior with vectors containing infinity values
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| diskann/src/neighbor/queue.rs | Added check to detect when insert_idx == capacity (indicating NaN/incomparable value) and ignore the insertion; added 6 unit tests covering NaN, infinity, and mixed scenarios |
| diskann-providers/src/index/diskann_async.rs | Added integration test inserting and searching vectors with infinity values to ensure no panics occur in realistic scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Pending benchmark validation... |
| 0 | ||
| }; | ||
|
|
||
| if self.size == self.capacity { |
There was a problem hiding this comment.
What do you think about inserting
if nbr.distance.is_nan() {
return;
}as the very first instruction to prevent Nans from showing up at all? (this is a pretty cheap operation on x86).
Or more directly - if NANs get in before the queue is at capacity, does it correctly recover if non-NAN distances are added? If not, we should bail unconditionally on NANs.
There was a problem hiding this comment.
A priority queue is incompatible fundamentally with NaNs, unless we are abusing them for some other purpose. I think @hildebrandmw's suggestion is the most reasonable solution.
You can never sort a NaN. All NaNs should be ignored as if the queue was full and it sorted to the end. Probably there should be a Also, didn't you fix this exact problem like 3-4 months ago? I remember reviewing this. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #796 +/- ##
==========================================
+ Coverage 89.27% 89.49% +0.21%
==========================================
Files 431 432 +1
Lines 78942 79482 +540
==========================================
+ Hits 70479 71133 +654
+ Misses 8463 8349 -114
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Why
When the inserted distance is not comparable with existing distances (neither larger nor smaller), the
insert_idxis incorrectly calculated (it is set to self.size). This typically happens when inserting a neighbor with NaN distance.This causes NeighborPriorityQueue to panic with this error, when the queue is already full:
End-to-end repro: see
vectors_with_large_values_should_be_inserted_and_searched_without_panic()test indiskann_async.rs.What
This PR changes the behavior of NeighborPriorityQueue to ignore an incomparable value (NaN or similar) when the queue is full and we can't determine where it belongs in the sorted order.
Note: the alternative approach would be to insert a NaN value at the end of the queue, but it would overwrite legit non-NaN value, which is not desirable.
The bug was discovered when we encountered large, non‑normalized vectors in a random dataset used during one of our experiments.