Skip to content

Support for Supplying and Persisting Row IDs During Index Build (Instead of JVector-Generated Node IDs) #575

@NivinCS

Description

@NivinCS

We would like JVector to support passing row IDs from the source table during index creation, and to store these row IDs directly inside the index.
This removes the need for JVector to generate its own node IDs and eliminates the external RowID ↔ NodeID mapping file.

Problem / Motivation

Currently:

  • JVector automatically assigns node IDs internally when adding vectors.
  • Presto/Iceberg uses row IDs as the authoritative identifier for retrieving data.
  • Because JVector does not accept row IDs, we must maintain an external mapping file linking NodeID ↔ RowID.
  • This adds extra complexity in:
    • index creation
    • storing and syncing mapping files
    • converting node IDs back to row IDs during ANN search
    • maintaining consistency when rebuilding or updating indexes

If JVector could store row IDs directly, we would no longer need node IDs at all.

Requested Functionality
We would like JVector to support:

  1. Passing Row IDs when building vectors
    Example:
    ImmutableGraphIndex index = builder.build(ravv, row_id);
  2. JVector internally stores row IDs in the index, instead of generating node IDs.
  3. ANN search results return row IDs directly
    Example:
    for (SearchResult.NodeScore ns : result.getNodes())
    { row_id = ns.row_id; }

Benefits

  • Removes the need for node IDs entirely.
  • Eliminates external mapping files and synchronization overhead.
  • Simplifies Presto + Iceberg integration significantly.
  • Makes search results directly usable for table lookups.
  • Reduces error potential and improves performance by avoiding mapping indirection.

Additional Notes

We are integrating JVector with Presto for vector search.
Native support for row IDs inside the index will make the integration simpler, cleaner, and more reliable.

We are happy to provide sample workflows or further details as needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions