Skip to content

Replace anyhow with structured error types in application crates#1761

Draft
Ericson2314 wants to merge 1 commit into
NixOS:masterfrom
obsidiansystems:error-types-2
Draft

Replace anyhow with structured error types in application crates#1761
Ericson2314 wants to merge 1 commit into
NixOS:masterfrom
obsidiansystems:error-types-2

Conversation

@Ericson2314
Copy link
Copy Markdown
Member

Every function now returns the narrowest error type that covers its actual failure modes, rather than a catch-all anyhow::Error. This makes it possible to distinguish infrastructure failures (DB down, store unreachable) from logic errors (step not found in queue, drv lookup miss) at the type level.

Among other benefits, this will assist the "fail-fast" PR in deciding what is fatal vs recoverable, and making sure those decisions are consistent and enforced.

Error types are co-located with the functions that produce them, using split impl State blocks where needed. The fine-grained sub-enums (e.g. ResolutionError, StepLookupError, MachineLookupError, DrvLookupError) might seem like overkill in a single 2500-line state/mod.rs, but they are designed so that a future file split (e.g. state/resolution.rs, state/step_completion.rs) would give each sub-module its own natural error type, with StateLogicError in mod.rs just re-exporting them. The error hierarchy becomes the module hierarchy.

Similarly, the builder's per-phase error types (ImportError, UploadError, BuildOutputParseError, ResultInfoError) are each defined next to the free functions they belong to, and JobFailure ties them together by build phase. A future split into import.rs, upload.rs, etc. would be straightforward.

New error-context crate provides WithContext<E>, a recursive enum (Root(E) | Context { context, source }) that adds context layers without changing the type — a typed replacement for anyhow::context(). StateError is a newtype around WithContext<StateErrorKind>.

anyhow is removed from workspace dependencies entirely. Examples use it as a non-workspace [dev-dependencies] only.

.tls_config(tls)?
.connect()
.await
.context("Failed to establish connection with Channel")?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we loose error context in cases like this where you prune the error message? I found it quiet useful to have errors at every step.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the WithContext thing needs to be used in more places.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is to divide errors by cause, not type. The information that would be put in context should be put in the message with #[error("message")], and the variant should include the relevant source with #[source].

current_bytes: fs_err::read_to_string(cgroups_path.join("memory.current"))?
.trim()
.parse()
.context("memory current parsing failed")?,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here. I haven't read the whole error hierarchy to understand what impact this has on error messages.

@Mic92
Copy link
Copy Markdown
Member

Mic92 commented May 22, 2026

Every function now returns the narrowest error type that covers its actual failure modes, rather than a catch-all anyhow::Error. This makes it possible to distinguish infrastructure failures (DB down, store unreachable) from logic errors (step not found in queue, drv lookup miss) at the type level.

Among other benefits, this will assist the "fail-fast" PR in deciding what is fatal vs recoverable, and making sure those decisions are consistent and enforced.

Error types are co-located with the functions that produce them, using split impl State blocks where needed. The fine-grained sub-enums (e.g. ResolutionError, StepLookupError, MachineLookupError, DrvLookupError) might seem like overkill in a single 2500-line state/mod.rs, but they are designed so that a future file split (e.g. state/resolution.rs, state/step_completion.rs) would give each sub-module its own natural error type, with StateLogicError in mod.rs just re-exporting them. The error hierarchy becomes the module hierarchy.

Similarly, the builder's per-phase error types (ImportError, UploadError, BuildOutputParseError, ResultInfoError) are each defined next to the free functions they belong to, and JobFailure ties them together by build phase. A future split into import.rs, upload.rs, etc. would be straightforward.

New error-context crate provides WithContext<E>, a recursive enum (Root(E) | Context { context, source }) that adds context layers without changing the type — a typed replacement for anyhow::context(). StateError is a newtype around WithContext<StateErrorKind>.

anyhow is removed from workspace dependencies entirely. Examples use it as a non-workspace [dev-dependencies] only.

Could you write pull request messages yourself, if you expect me to review them?

@Ericson2314 Ericson2314 marked this pull request as draft May 27, 2026 18:10
Every function now returns the narrowest error type that covers its
actual failure modes, rather than a catch-all `anyhow::Error`. This
makes it possible to distinguish infrastructure failures (DB down, store
unreachable) from logic errors (step not found in queue, drv lookup
miss) at the type level.

Among other benefits, this will assist the "fail-fast" PR in deciding
what is fatal vs recoverable, and making sure those decisions are
consistent and enforced.

Error types are co-located with the functions that produce them, using
split `impl State` blocks where needed. The fine-grained sub-enums (e.g.
`ResolutionError`, `StepLookupError`, `MachineLookupError`,
`DrvLookupError`) might seem like overkill in a single 2500-line
`state/mod.rs`, but they are designed so that a future file split (e.g.
`state/resolution.rs`, `state/step_completion.rs`) would give each
sub-module its own natural error type, with `StateLogicError` in
`mod.rs` just re-exporting them. The error hierarchy becomes the module
hierarchy.

Similarly, the builder's per-phase error types (`ImportError`,
`UploadError`, `BuildOutputParseError`, `ResultInfoError`) are each
defined next to the free functions they belong to, and `JobFailure` ties
them together by build phase. A future split into `import.rs`,
`upload.rs`, etc. would be straightforward.

New `error-context` crate provides `WithContext<E>`, a recursive enum
(`Root(E)` | `Context { context, source }`) that adds context layers
without changing the type — a typed replacement for `anyhow::context()`.
`StateError` is a newtype around `WithContext<StateErrorKind>`.

`anyhow` is removed from workspace dependencies entirely. Examples use
it as a non-workspace `[dev-dependencies]` only.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants