Speed up ivec reads by buffering #584

ashkrisk · 2025-12-01T08:55:30Z

MultiFileDataSource makes use of SiftLoader.readFvecs to read base and query vectors, and SiftLoader.readIvecs to read the provided ground truth. The readIvecs function is currently quite inefficient due to lack of buffering, disproportionately slowing down the time taken to load the Dataset.

This is not so important for Bench, where the time taken to load the dataset is insignificant compared to the time taken to build the index. However, this becomes quite important when running short-lived programs with pre-created graphs, especially during rapid prototyping.

This PR addresses this by adding a BufferedInputStream, similar to the current implementation of readFvecs.

Some numbers from my machine based on a dataset with ~2M base vectors and ~50K query vectors illustrates the difference:

File	Size	Contents	Time
base.fvecs	964M	~2M 128D fvecs	2.6s
query.fvecs	25M	~50K 128D fvecs	0.12s
gt.ivecs	58M	~50K 300D ivecs	10.3s (unbuffered) 0.34s (buffered)

Without buffering, reading the ground truth is ~4x slower than the actual base vectors. With buffering, the ground truth is no longer the bottleneck.

marianotepper

LGTM

Speed up ivec reads by buffering

f5c3444

ashkrisk requested review from MarkWolters, jshook, marianotepper and tlwillke as code owners December 1, 2025 08:55

marianotepper approved these changes Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up ivec reads by buffering #584

Speed up ivec reads by buffering #584

Uh oh!

ashkrisk commented Dec 1, 2025 •

edited

Loading

Uh oh!

marianotepper left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Speed up ivec reads by buffering #584

Are you sure you want to change the base?

Speed up ivec reads by buffering #584

Uh oh!

Conversation

ashkrisk commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marianotepper left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ashkrisk commented Dec 1, 2025 •

edited

Loading