-
Notifications
You must be signed in to change notification settings - Fork 22
disk-poc add multi-threaded #872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: disk-poc
Are you sure you want to change the base?
Conversation
…into dorer-disk-poc-add-delete-mt
…into dorer-disk-poc-add-delete-mt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces multi-threaded vector insertion support and transitions the HNSW Disk index from batch-based to batchless mode. The main objectives are to enable parallel vector insertions via a job queue system and reduce lock contention through a segmented neighbor cache architecture.
Key changes:
- Added
HNSWDiskInsertJobandHNSWDiskSingleInsertJobstructures for parallel vector insertions with self-contained vector data to avoid race conditions - Replaced batch-based insertion with batchless mode using a 64-segment neighbor cache with per-segment locks for reduced contention
- Changed
curElementCounttostd::atomic<size_t>and added multiple shared mutexes (stagedUpdatesGuard,vectorsGuard,rawVectorsGuard) for improved concurrency
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/utils/mock_thread_pool.h | Added isIdle() helper to check if all jobs are complete |
| tests/unit/test_quantized_hnsw_disk.cpp | Removed flushBatch() calls, updated comments for batchless mode |
| tests/unit/test_hnsw_disk.cpp | Updated tests to reflect new indexSize() behavior (active elements only) and removed batch flushing |
| tests/benchmark/run_files/bm_hnsw_disk_single_fp32.cpp | Changed index file path to use .zip extension |
| tests/benchmark/data/scripts/hnsw_disk_serializer.cpp | Added multi-threading support with new parameters and progress reporting |
| tests/benchmark/data/scripts/CMakeLists.txt | Included mock_thread_pool source files and headers for serializer |
| tests/benchmark/bm_vecsim_index.h | Added comment clarifying job queue is not set by default |
| tests/benchmark/bm_initialization/bm_hnsw_disk_initialize_fp32.h | Added new async AddLabel benchmark with multi-threaded configuration |
| src/VecSim/vec_sim_common.h | Added three new job types for disk insert operations |
| src/VecSim/algorithms/hnsw/hnsw_disk_serializer.h | Updated serialization to handle atomic curElementCount and removed legacy batch state |
| src/VecSim/algorithms/hnsw/hnsw_disk.h | Core implementation: segmented cache, lock-free operations, batchless insertion, and MT job execution |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 26 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Describe the changes in the pull request
This PR introduces multi-threaded vector insertion support and batchless mode for the HNSW Disk index. The main changes include:
Which issues this PR fixes
Main objects this PR modified
src/VecSim/algorithms/hnsw/hnsw_disk.h- Core HNSW Disk index with MT support and segmented cachesrc/VecSim/algorithms/hnsw/hnsw_disk_serializer.h- Updated serialization for atomic fields and removed legacy batch statesrc/VecSim/spaces/computer/preprocessors.h- Added 4-bit scalar quantization preprocessortests/unit/test_hnsw_disk.cpp- Updated tests for batchless modeMark if applicable