Skip to content

Llt 7085 - Refactor telio-dns: Replace string-based error handling with strongly-typed enums across library and tests#1724

Open
Damos-dev wants to merge 5 commits intomainfrom
LLT-7085
Open

Llt 7085 - Refactor telio-dns: Replace string-based error handling with strongly-typed enums across library and tests#1724
Damos-dev wants to merge 5 commits intomainfrom
LLT-7085

Conversation

@Damos-dev
Copy link
Copy Markdown

@Damos-dev Damos-dev commented Mar 25, 2026

Problem

The telio-dns crate was using String as the error type across both public API and internal logic. This made error handling non-idiomatic and limited:

  • no ability to match on specific error types
  • error handling required string parsing at call sites
  • inconsistent error propagation across modules
  • reduced clarity of failure modes in the public API

Additionally, there was no clear separation between errors that are part of the public API contract and those used internally for DNS packet processing.


Solution

Refactored error handling in telio-dns to use typed enums instead of String,
with a clear separation between public and internal error domains.

Key changes:

  • Introduced a public Error enum (in error.rs) and Result<T> alias
  • Introduced DnsIoError struct to handle in the Error io::Error with DnsIoContext to have clean forwarding of io errors
  • Replaced all Result<_, String> in the public API with typed errors
  • Updated public traits (DnsResolver, NameServer) and constructors to return Error
  • Defined a separate internal PacketError enum (in nameserver.rs) for DNS packet parsing and runtime errors
  • Ensured internal errors are not exposed through the public API
  • Replaced manual format!-based error construction with structured error variants
  • Updated code outside of the telio-dns ( device.rs) to use typed errors and From for propagation

Additionally:

  • Replaced DnsResolverError(String) with DnsResolverError(#[from] telio_dns::error::Error)
    to enable automatic error propagation via ?

This change introduces a clean boundary between:

  • public API errors (stable, typed, exposed via Error)
  • internal runtime errors (private, implementation-specific via PacketError)

As a result, the API will be more robust, type-safe, and easier to extend without breaking external contracts.

To do in the future (not subject of this PR):

  • Refactor the codebase to enable dependency injection for key components, especially in the DNS server logic. This would allow unit tests to simulate errors from external modules (such as Neptune) or the OS, making it possible to test error propagation and handling in the public API.
  • Currently, it is not feasible to fully test error scenarios in the DNS server using unit tests, because the code structure does not allow for mocking or simulating failures in dependencies like Neptune or system calls. As a result, some error-handling paths in the public API cannot be reliably covered by unit tests.

☑️ Definition of Done checklist

  • Commit history is clean (requirements)
  • README.md is updated
  • Functionality is covered by unit or integration tests

@nord-llt
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Daniel Moszczanski seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment thread crates/telio-dns/src/dns.rs Outdated
) -> crate::error::Result<()>;
/// Configure list of forward DNS servers for zone '.'.
async fn forward(&self, to: &[IpAddr]) -> Result<(), String>;
async fn forward(&self, to: &[IpAddr]) -> crate::error::Result<()>;
Copy link
Copy Markdown
Contributor

@tomasz-grz tomasz-grz Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer to see just Result<()> here, or at least keep the Result<(), Error>

you can import it with use like use crate::error::Result;

Or in this case

use crate::{bind_tun, LocalNameServer, NameServer, Records, error::Error, error::Result};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or actually

use crate::{bind_tun, LocalNameServer, NameServer, Records, error::{Error, Result}};

Copy link
Copy Markdown
Author

@Damos-dev Damos-dev Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as use crate::{bind_tun, error::{Error,Result}, LocalNameServer, NameServer, Records}; to keep proper order.

Comment thread crates/telio-dns/src/dns.rs Outdated
let socket = UdpSocket::bind(SocketAddr::from(([127, 0, 0, 1], 0)))
.await
.map_err(|e| {
telio_log_error!("Failed to bind dns socket: {:?}", e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these logs redundant? Will the returned Error be logged anyway?

Copy link
Copy Markdown
Author

@Damos-dev Damos-dev Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right!
With the recent change from string-based errors to typed enums, we no longer need to log errors at the point where they are created. Instead, errors can be propagated and handled (and logged) at the call site.

At the moment, I don’t see any other logging "at the boundary" where this code is called, which is why I left the existing logs in place. It’s possible they are still relied upon elsewhere (e.g. in tests or other integration points).

If there are no such dependencies, I’d suggest moving towards a proper “log at the boundary” approach: remove the logs from here and handle logging in device.rs, where the error is actually observed and acted upon.

I think this change should be handled as a separate subtask and commit. Grouping all related logging changes into a focused commit with a clear message - would make the intent and idea much easier to understand, rather than mixing it into change of the error handling approach. I have made LLT-7174 for that.

Comment thread crates/telio-dns/src/error.rs Outdated
Comment thread crates/telio-dns/src/error.rs Outdated
pub enum Error {
/// Failed to bind DNS socket
#[error("Failed to bind DNS socket: {0}")]
BindDnsSocket(#[source] io::Error),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using #[from] attribute is generally preferable, because it gives you the From, Into and source implementations for "free".

Copy link
Copy Markdown
Author

@Damos-dev Damos-dev Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one can benefit from the "#[from]" but the rest will conflict as "thiserror" will try to generate impl for them as well, and it will cause a conflict. So we have now changed BindDnsSocket(#[from] io::Error) to use #from but two other errors from io::error will have to use "source" and later map_err.

Copy link
Copy Markdown
Contributor

@tomasz-grz tomasz-grz Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken you don't need to derive #[source] in order to use map_err

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way of handling io errors has been redesigned.

Comment thread crates/telio-dns/src/error.rs Outdated
Comment on lines +64 to +66
pub fn bind_dns_socket(err: io::Error) -> Self {
Self::BindDnsSocket(err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these intended for?

If you use the From trait by the #[from], I believe you don't need to explicitly do any error type conversions.

Copy link
Copy Markdown
Author

@Damos-dev Damos-dev Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been changed to ? but two other errors will have to use map_err.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, you can use map_err to convert the error types. I don't see these functions used anywhere.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpers removed due to a different approach.

Comment thread crates/telio-dns/src/nameserver.rs Outdated
const IDLE_TIME: Duration = Duration::from_secs(1);

#[derive(Debug, Error)]
#[non_exhaustive]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possibly fine to omit #[non_exhaustive] for internal errors.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we don't have any code what will benefit from that - so removed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[non_exhaustive] is still there 👀

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread crates/telio-dns/src/zone.rs Outdated
name: &LowerName,
lookup_options: LookupOptions,
) -> Result<Self::Lookup, LookupError> {
) -> std::result::Result<Self::Lookup, LookupError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TheResult<T, E> is already included as part of the "Standard library", you don't need to explicitly specify std::result::

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved by aliasing crate::error::Result as TelioResult.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't see the benefit of using a type alias when there are multiple error types in the module.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really need to use an alias, since it's not a global libtelio type, I would be more in favor of something like DnsResult.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used DnsResult. Aliasing was required.

Comment thread crates/telio-dns/src/error.rs Outdated

/// Failed to get DNS socket local address
#[error("Failed to get DNS socket local address: {0}")]
GetDnsSocketLocalAddr(#[source] io::Error),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these variants are already in the DNS crate, what do you think dropping it from the error type name?
Something like GetLocalAddr or simply LocalAddr would be less verbose in my opinion.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error type will give us more detailed info and I would like to keep it. Yet.

Copy link
Copy Markdown
Contributor

@tomasz-grz tomasz-grz Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please make the other errors follow the same convention,

maybe something like ConfigureDnsTunnel, CreateDnsTunnel and their error descriptions, to better distinguish them from the actual VPN tunnel errors

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the error variants of nameserver::PacketError are much more "human readable" than these, and personally prefer that style.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to redesigned way of handling io errors - such complex names are no longer exists.

Comment thread crates/telio-dns/src/dns.rs Outdated
Comment thread crates/telio-dns/src/zone.rs Outdated

impl ForwardZone {
pub(crate) async fn new(name: &str, ips: &[IpAddr]) -> Result<Self, String> {
pub(crate) async fn new(name: &str, ips: &[IpAddr]) -> Result<Self> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you are replacing the std::result::Result type here with the crate::error::Result type def.

I would recommend simply to do Result<Self, Error> here.
Since you already import it use crate::error::Error

FIY you can also rename imports, if you have clashes or want to make more explicit.

use crate::error::Error as DnsError;

/// then this would become
Result<Self, DnsError>

In theory it should be possible rename the imported Result (though I wouldn't recommend it).

use crate::error::Result as DnsResult;

/// then write this as 
DnsResult<Self>

This way it will also not clash with std Result

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I like this approach and it does make the code a bit cleaner and more explicit.

One thing to keep in mind, though, is that aliasing like this can make it harder to find all usages later, using a simple text search. It will require a two-step process: first locating the alias definitions, and then searching for their usages.

So while it does help avoid clashes with std::result::Result and improves readability, it also introduces a bit of indirection in terms of future code analysis.

Overall, I agree it’s a reasonable trade-off, especially for clarity, but it’s worth being aware of this downside in the future.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway - solved by aliasing new Result with a different name.

@Damos-dev Damos-dev marked this pull request as ready for review March 31, 2026 04:18
@Damos-dev Damos-dev requested a review from a team as a code owner March 31, 2026 04:18
Copy link
Copy Markdown
Contributor

@tomasz-grz tomasz-grz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is taking shape.

Please update the PR name

Comment thread .unreleased/LLT-7085 Outdated
@@ -0,0 +1 @@
Refactor error handling in telio-dns: remove simple String and replace it with enums for both: public (public API) and private (internal) errors No newline at end of file
Copy link
Copy Markdown
Contributor

@tomasz-grz tomasz-grz Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add a message here only if there are breaking changes to the App facing public API,
since the telio-dns module is not facing directly the final app, it should be ok to skip the message here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also going to add that I think this file should be left empty. The purpose of the changelog is to communicate to apps what has changed for them, and this change shouldn't affect them

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines -144 to +130
Ok(self.nameserver.forward(to).await?)
self.nameserver.forward(to).await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Comment thread crates/telio-dns/src/error.rs Outdated

/// Failed to get DNS socket local address
#[error("Failed to get DNS socket local address: {0}")]
GetDnsSocketLocalAddr(#[source] io::Error),
Copy link
Copy Markdown
Contributor

@tomasz-grz tomasz-grz Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please make the other errors follow the same convention,

maybe something like ConfigureDnsTunnel, CreateDnsTunnel and their error descriptions, to better distinguish them from the actual VPN tunnel errors

Comment thread crates/telio-dns/src/error.rs Outdated
Comment on lines +64 to +66
pub fn bind_dns_socket(err: io::Error) -> Self {
Self::BindDnsSocket(err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, you can use map_err to convert the error types. I don't see these functions used anywhere.

Comment thread crates/telio-dns/src/forward.rs Outdated
_zone_type: ZoneType,
config: ForwardConfig,
) -> Result<Self, String> {
) -> TelioResult<Self> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> TelioResult<Self> {
) -> Result<Self, Error> {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been changed to DnsResult as shadowing was required due to two Result definitions available.

Comment thread crates/telio-dns/src/forward.rs Outdated
use tokio::net::UdpSocket;

use crate::bind_tun;
use crate::{bind_tun, error::Result as TelioResult};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use crate::{bind_tun, error::Result as TelioResult};
use crate::{bind_tun, error::Error};

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been changed to DnsResult as shadowing was required due to two Result definitions available.

Comment thread crates/telio-dns/src/nameserver.rs Outdated
async fn stop(&self);
/// Configure list of forward DNS servers for zone '.'.
async fn forward(&self, to: &[IpAddr]) -> Result<(), String>;
async fn forward(&self, to: &[IpAddr]) -> TelioResult<()>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn forward(&self, to: &[IpAddr]) -> TelioResult<()>;
async fn forward(&self, to: &[IpAddr]) -> Result<(), Error>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to use the type alias here, maybe DnsResult would be a more suitable name?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used DnsResult as suggested.

Comment thread crates/telio-dns/src/nameserver.rs Outdated
const IDLE_TIME: Duration = Duration::from_secs(1);

#[derive(Debug, Error)]
#[non_exhaustive]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[non_exhaustive] is still there 👀

Comment thread crates/telio-dns/src/error.rs Outdated

/// Failed to get DNS socket local address
#[error("Failed to get DNS socket local address: {0}")]
GetDnsSocketLocalAddr(#[source] io::Error),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the error variants of nameserver::PacketError are much more "human readable" than these, and personally prefer that style.

@Damos-dev Damos-dev changed the title Llt 7085 Llt 7085 - Improve error handling in telio-dns by introducing error enums instead of strings. Apr 13, 2026
@Damos-dev Damos-dev changed the title Llt 7085 - Improve error handling in telio-dns by introducing error enums instead of strings. Llt 7085 - Refactor telio-dns: Replace string-based error handling with strongly-typed enums across library and tests Apr 13, 2026
Copy link
Copy Markdown
Contributor

@mathiaspeters mathiaspeters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, just one question asked at three different places

Comment thread crates/telio-dns/src/forward.rs
Comment thread crates/telio-dns/src/zone.rs Outdated
Comment thread crates/telio-dns/src/nameserver.rs Outdated
@Damos-dev
Copy link
Copy Markdown
Author

Nice, this is taking shape.

Please update the PR name

Done

@mathiaspeters
Copy link
Copy Markdown
Contributor

+1

Daniel Moszczanski added 5 commits April 23, 2026 02:25
Refactor all local telio-dns errors to use enums
instead of String
These errors are internal to the crate and not
exposed via the public API
- Add a test to verify that AuthoritativeZone::newreturns
Error::RecordOutsideZone when a record domain does not match the
specified zone.
- Add a test to confirm that AuthoritativeZone::new returns Ok when
all record domains match the zone.
- These tests ensure correct validation of zone membership during zone
creation and improve negative-path coverage.
Replace string-based errors in telio-dns
public API with typed enums

Refactor public API to use Error enum instead of
Result<_, String>
Enable structured error handling and remove need
for string parsing

Key changes:
- Introduce crate::error::Error and
  Result<T> alias
- Update DnsResolver and NameServer to return
  typed errors
- Remove string-based error handling from the
  public API
- Update device.rs to use
  DnsResolverError(#[from] Error)
  and enable propagation via ?
- Introduce Error::Io to be able to use #[from]
DnsIoError to automaticaly distinguish between 3
different I/O errors what can happened and we
don't loose ability to collect information about
the real location and reason of an error.

This is a breaking change within the workspace but
improves type safety and API clarity
…version

- Test DnsIoError display formatting for context and source error
messages
- Test conversion from DnsIoError to Error::Io and verify transparent
error display
- Ensures error formatting and conversion logic remain correct
- Test returns_record_outside_zone_error: Verifies that
AuthoritativeZone::new returns an Error::RecordOutsideZone when a
record's domain does not match the specified zone name. This ensures
that only records belonging to the correct zone are accepted

- Test returns_ok_for_valid_zone: Checks that AuthoritativeZone::new
returns Ok when all record domains match the specified zone name. This
confirms that valid zone records are accepted without error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants