-
Notifications
You must be signed in to change notification settings - Fork 143
Open
Description
We would like JVector to support passing row IDs from the source table during index creation, and to store these row IDs directly inside the index.
This removes the need for JVector to generate its own node IDs and eliminates the external RowID ↔ NodeID mapping file.
Problem / Motivation
Currently:
- JVector automatically assigns node IDs internally when adding vectors.
- Presto/Iceberg uses row IDs as the authoritative identifier for retrieving data.
- Because JVector does not accept row IDs, we must maintain an external mapping file linking NodeID ↔ RowID.
- This adds extra complexity in:
- index creation
- storing and syncing mapping files
- converting node IDs back to row IDs during ANN search
- maintaining consistency when rebuilding or updating indexes
If JVector could store row IDs directly, we would no longer need node IDs at all.
Requested Functionality
We would like JVector to support:
- Passing Row IDs when building vectors
Example:
ImmutableGraphIndex index = builder.build(ravv, row_id); - JVector internally stores row IDs in the index, instead of generating node IDs.
- ANN search results return row IDs directly
Example:
for (SearchResult.NodeScore ns : result.getNodes())
{ row_id = ns.row_id; }
Benefits
- Removes the need for node IDs entirely.
- Eliminates external mapping files and synchronization overhead.
- Simplifies Presto + Iceberg integration significantly.
- Makes search results directly usable for table lookups.
- Reduces error potential and improves performance by avoiding mapping indirection.
Additional Notes
We are integrating JVector with Presto for vector search.
Native support for row IDs inside the index will make the integration simpler, cleaner, and more reliable.
We are happy to provide sample workflows or further details as needed.
Metadata
Metadata
Assignees
Labels
No labels