Skip to content

Problems with GraphIndexBuilder.buildAndMergeNewNodes #568

@marianotepper

Description

@marianotepper

There seems to be a bug in the code for remapping ordinals in GraphIndexBuilder.buildAndMergeNewNodes. Upon an initial inspection, it looks like the ordinals in the OnHeapGraph loaded in that function are never subject to any type of remapping. Of course, more digging is needed to fully characterize the error. Note that the function GraphIndexBuilder.buildAndMergeNewNodes is experimental.

The test OnHeapGraphIndexTest.testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping if failing.
When running OnHeapGraphIndexTest.testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping, with different numbers of base and new vectors, we the error seems to become apparent.

When setting:

private static final int NUM_BASE_VECTORS = 1000;
private static final int NUM_NEW_VECTORS = 1000;

We get

  • recall of baseline graph: 0.68500006
  • recall of merged graph: 0.54200006

Here, there is a 14% gap that can be wider depending on the random order of insertion.

When setting:

private static final int NUM_BASE_VECTORS = 1000;
private static final int NUM_NEW_VECTORS = 10_000;

We get

  • recall of baseline graph: 0.59099996
  • recall of merged graph: 0.56100005

Here, the gap reduces to 3% because the remapped nodes are the majority.

When setting:

private static final int NUM_BASE_VECTORS = 10_000;
private static final int NUM_NEW_VECTORS = 1000;

We get

  • recall of baseline graph: 0.5409999
  • recall of merged graph: 0.08299999

Here, the recall of the merged graph plummets to virtually zero because the non-remapped nodes are the majority. Note that this is the intended use (add a few new nodes to a large graph).

In all of these cases, the recall between the baseline and the merged graphs should be roughly equivalent (plus/minus differences due to the random insertion order).

Actions

  • Should we exclude this test until the underlying problem is resolved?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions