Skip to content

RUST-2330 Serialize Facet types to BSON binary#656

Open
abr-egn wants to merge 18 commits intomongodb:mainfrom
abr-egn:RUST-2330/facet-format
Open

RUST-2330 Serialize Facet types to BSON binary#656
abr-egn wants to merge 18 commits intomongodb:mainfrom
abr-egn:RUST-2330/facet-format

Conversation

@abr-egn
Copy link
Copy Markdown
Contributor

@abr-egn abr-egn commented Apr 9, 2026

RUST-2330

This adds bson::facet::to_vec, which serializes arbitrary types that implement Facet to BSON bytes, properly handling embedded Rust BSON types.

# make sure we're running in the repo root
cd $(dirname $0)/..

RUSTFLAGS="--cfg mongodb_internal_bench" cargo +nightly bench tests::bench --all-features No newline at end of file
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo bench requires using nightly; I put the benchmarks behind a custom cfg flag rather than a feature so they don't get picked up even when using --all-features.

}

#[derive(Debug)]
struct Serializer {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Facet serialization machinery feeds a stream of events rather than using visitors, the structure here is quite different than the Serde serialization; I find this setup easier to follow but maybe that's just because I wrote it.

values:
- id: "min"
display_name: "1.81 (minimum supported version)"
display_name: "1.85 (minimum supported version)"
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This along with the edition change allow let-chains which are really handy and this version's well past our declared threshold.

This did come with a bit more churn than I expected (formatted order of imports changed), I can pull that out if you think that'd be better.

}
}

impl TryFrom<&RawArray> for Bson {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These (and similar added elsewhere) made testing more convenient and they seem fine to export.

/// Note that accessing elements is an O(N) operation, as it requires iterating through the document
/// from the beginning to find the requested key.
#[derive(Clone, PartialEq)]
#[cfg_attr(feature = "facet-unstable", derive(facet::Facet), facet(opaque))]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the module types that derive Facet are also tagged with opaque; this means that you can't use facet machinery to inspect the internals and unless a [de]serializer special cases them (like bson::facet::to_vec does) they can only be [de]serialized via a proxy type (like bson::facet::ExtJson). This seems pretty ideal to me since it means we're not exposing internal field structure. There's maybe a case for making those structs that have all pub fields non-opaque, but that seems more like a "consider it if anyone asks for it" kind of thing.

/// A BSON value backed by owned raw BSON bytes.
#[derive(Debug, Clone, PartialEq)]
#[cfg_attr(feature = "facet-unstable", derive(facet::Facet), facet(opaque))]
#[repr(C)]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Facet requires enums to have a specified repr since the default unspecified one makes it basically impossible to say anything about in-memory layout.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These show facet to be 3-4x slower than serde, which is a bummer. I'm not sure how much of that is general facet overhead and how much could be reclaimed from optimization of the facet serializer. The structure of the latter should mean it's pretty comparable to the serde serializer, e.g. no intermediate buffers, short-circuiting for raw bson bytes, etc.

@abr-egn abr-egn marked this pull request as ready for review April 9, 2026 11:40
@abr-egn abr-egn requested a review from a team as a code owner April 9, 2026 11:40
@abr-egn abr-egn requested a review from isabelatkinson April 9, 2026 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant