-
Notifications
You must be signed in to change notification settings - Fork 143
Description
There seems to be a bug in the code for remapping ordinals in GraphIndexBuilder.buildAndMergeNewNodes. Upon an initial inspection, it looks like the ordinals in the OnHeapGraph loaded in that function are never subject to any type of remapping. Of course, more digging is needed to fully characterize the error. Note that the function GraphIndexBuilder.buildAndMergeNewNodes is experimental.
The test OnHeapGraphIndexTest.testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping if failing.
When running OnHeapGraphIndexTest.testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping, with different numbers of base and new vectors, we the error seems to become apparent.
When setting:
private static final int NUM_BASE_VECTORS = 1000;
private static final int NUM_NEW_VECTORS = 1000;
We get
- recall of baseline graph: 0.68500006
- recall of merged graph: 0.54200006
Here, there is a 14% gap that can be wider depending on the random order of insertion.
When setting:
private static final int NUM_BASE_VECTORS = 1000;
private static final int NUM_NEW_VECTORS = 10_000;
We get
- recall of baseline graph: 0.59099996
- recall of merged graph: 0.56100005
Here, the gap reduces to 3% because the remapped nodes are the majority.
When setting:
private static final int NUM_BASE_VECTORS = 10_000;
private static final int NUM_NEW_VECTORS = 1000;
We get
- recall of baseline graph: 0.5409999
- recall of merged graph: 0.08299999
Here, the recall of the merged graph plummets to virtually zero because the non-remapped nodes are the majority. Note that this is the intended use (add a few new nodes to a large graph).
In all of these cases, the recall between the baseline and the merged graphs should be roughly equivalent (plus/minus differences due to the random insertion order).
Actions
- Should we exclude this test until the underlying problem is resolved?