Llt 7085 - Refactor telio-dns: Replace string-based error handling with strongly-typed enums across library and tests#1724
Conversation
|
Daniel Moszczanski seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| ) -> crate::error::Result<()>; | ||
| /// Configure list of forward DNS servers for zone '.'. | ||
| async fn forward(&self, to: &[IpAddr]) -> Result<(), String>; | ||
| async fn forward(&self, to: &[IpAddr]) -> crate::error::Result<()>; |
There was a problem hiding this comment.
I would personally prefer to see just Result<()> here, or at least keep the Result<(), Error>
you can import it with use like use crate::error::Result;
Or in this case
use crate::{bind_tun, LocalNameServer, NameServer, Records, error::Error, error::Result};There was a problem hiding this comment.
Or actually
use crate::{bind_tun, LocalNameServer, NameServer, Records, error::{Error, Result}};There was a problem hiding this comment.
Done as use crate::{bind_tun, error::{Error,Result}, LocalNameServer, NameServer, Records}; to keep proper order.
| let socket = UdpSocket::bind(SocketAddr::from(([127, 0, 0, 1], 0))) | ||
| .await | ||
| .map_err(|e| { | ||
| telio_log_error!("Failed to bind dns socket: {:?}", e); |
There was a problem hiding this comment.
Are these logs redundant? Will the returned Error be logged anyway?
There was a problem hiding this comment.
You are right!
With the recent change from string-based errors to typed enums, we no longer need to log errors at the point where they are created. Instead, errors can be propagated and handled (and logged) at the call site.
At the moment, I don’t see any other logging "at the boundary" where this code is called, which is why I left the existing logs in place. It’s possible they are still relied upon elsewhere (e.g. in tests or other integration points).
If there are no such dependencies, I’d suggest moving towards a proper “log at the boundary” approach: remove the logs from here and handle logging in device.rs, where the error is actually observed and acted upon.
I think this change should be handled as a separate subtask and commit. Grouping all related logging changes into a focused commit with a clear message - would make the intent and idea much easier to understand, rather than mixing it into change of the error handling approach. I have made LLT-7174 for that.
| pub enum Error { | ||
| /// Failed to bind DNS socket | ||
| #[error("Failed to bind DNS socket: {0}")] | ||
| BindDnsSocket(#[source] io::Error), |
There was a problem hiding this comment.
Using #[from] attribute is generally preferable, because it gives you the From, Into and source implementations for "free".
There was a problem hiding this comment.
The first one can benefit from the "#[from]" but the rest will conflict as "thiserror" will try to generate impl for them as well, and it will cause a conflict. So we have now changed BindDnsSocket(#[from] io::Error) to use #from but two other errors from io::error will have to use "source" and later map_err.
There was a problem hiding this comment.
If I'm not mistaken you don't need to derive #[source] in order to use map_err
There was a problem hiding this comment.
Way of handling io errors has been redesigned.
| pub fn bind_dns_socket(err: io::Error) -> Self { | ||
| Self::BindDnsSocket(err) | ||
| } |
There was a problem hiding this comment.
What are these intended for?
If you use the From trait by the #[from], I believe you don't need to explicitly do any error type conversions.
There was a problem hiding this comment.
It's been changed to ? but two other errors will have to use map_err.
There was a problem hiding this comment.
Exactly, you can use map_err to convert the error types. I don't see these functions used anywhere.
There was a problem hiding this comment.
Helpers removed due to a different approach.
| const IDLE_TIME: Duration = Duration::from_secs(1); | ||
|
|
||
| #[derive(Debug, Error)] | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
It's possibly fine to omit #[non_exhaustive] for internal errors.
There was a problem hiding this comment.
At the moment we don't have any code what will benefit from that - so removed.
There was a problem hiding this comment.
#[non_exhaustive] is still there 👀
| name: &LowerName, | ||
| lookup_options: LookupOptions, | ||
| ) -> Result<Self::Lookup, LookupError> { | ||
| ) -> std::result::Result<Self::Lookup, LookupError> { |
There was a problem hiding this comment.
TheResult<T, E> is already included as part of the "Standard library", you don't need to explicitly specify std::result::
There was a problem hiding this comment.
Solved by aliasing crate::error::Result as TelioResult.
There was a problem hiding this comment.
I personally don't see the benefit of using a type alias when there are multiple error types in the module.
There was a problem hiding this comment.
If you really need to use an alias, since it's not a global libtelio type, I would be more in favor of something like DnsResult.
There was a problem hiding this comment.
Used DnsResult. Aliasing was required.
|
|
||
| /// Failed to get DNS socket local address | ||
| #[error("Failed to get DNS socket local address: {0}")] | ||
| GetDnsSocketLocalAddr(#[source] io::Error), |
There was a problem hiding this comment.
Since these variants are already in the DNS crate, what do you think dropping it from the error type name?
Something like GetLocalAddr or simply LocalAddr would be less verbose in my opinion.
There was a problem hiding this comment.
Error type will give us more detailed info and I would like to keep it. Yet.
There was a problem hiding this comment.
Then please make the other errors follow the same convention,
maybe something like ConfigureDnsTunnel, CreateDnsTunnel and their error descriptions, to better distinguish them from the actual VPN tunnel errors
There was a problem hiding this comment.
I feel like the error variants of nameserver::PacketError are much more "human readable" than these, and personally prefer that style.
There was a problem hiding this comment.
Due to redesigned way of handling io errors - such complex names are no longer exists.
|
|
||
| impl ForwardZone { | ||
| pub(crate) async fn new(name: &str, ips: &[IpAddr]) -> Result<Self, String> { | ||
| pub(crate) async fn new(name: &str, ips: &[IpAddr]) -> Result<Self> { |
There was a problem hiding this comment.
I see, you are replacing the std::result::Result type here with the crate::error::Result type def.
I would recommend simply to do Result<Self, Error> here.
Since you already import it use crate::error::Error
FIY you can also rename imports, if you have clashes or want to make more explicit.
use crate::error::Error as DnsError;
/// then this would become
Result<Self, DnsError>In theory it should be possible rename the imported Result (though I wouldn't recommend it).
use crate::error::Result as DnsResult;
/// then write this as
DnsResult<Self>This way it will also not clash with std Result
There was a problem hiding this comment.
Good point - I like this approach and it does make the code a bit cleaner and more explicit.
One thing to keep in mind, though, is that aliasing like this can make it harder to find all usages later, using a simple text search. It will require a two-step process: first locating the alias definitions, and then searching for their usages.
So while it does help avoid clashes with std::result::Result and improves readability, it also introduces a bit of indirection in terms of future code analysis.
Overall, I agree it’s a reasonable trade-off, especially for clarity, but it’s worth being aware of this downside in the future.
There was a problem hiding this comment.
Anyway - solved by aliasing new Result with a different name.
tomasz-grz
left a comment
There was a problem hiding this comment.
Nice, this is taking shape.
Please update the PR name
| @@ -0,0 +1 @@ | |||
| Refactor error handling in telio-dns: remove simple String and replace it with enums for both: public (public API) and private (internal) errors No newline at end of file | |||
There was a problem hiding this comment.
We add a message here only if there are breaking changes to the App facing public API,
since the telio-dns module is not facing directly the final app, it should be ok to skip the message here.
There was a problem hiding this comment.
I was also going to add that I think this file should be left empty. The purpose of the changelog is to communicate to apps what has changed for them, and this change shouldn't affect them
| Ok(self.nameserver.forward(to).await?) | ||
| self.nameserver.forward(to).await |
|
|
||
| /// Failed to get DNS socket local address | ||
| #[error("Failed to get DNS socket local address: {0}")] | ||
| GetDnsSocketLocalAddr(#[source] io::Error), |
There was a problem hiding this comment.
Then please make the other errors follow the same convention,
maybe something like ConfigureDnsTunnel, CreateDnsTunnel and their error descriptions, to better distinguish them from the actual VPN tunnel errors
| pub fn bind_dns_socket(err: io::Error) -> Self { | ||
| Self::BindDnsSocket(err) | ||
| } |
There was a problem hiding this comment.
Exactly, you can use map_err to convert the error types. I don't see these functions used anywhere.
| _zone_type: ZoneType, | ||
| config: ForwardConfig, | ||
| ) -> Result<Self, String> { | ||
| ) -> TelioResult<Self> { |
There was a problem hiding this comment.
| ) -> TelioResult<Self> { | |
| ) -> Result<Self, Error> { |
There was a problem hiding this comment.
It's been changed to DnsResult as shadowing was required due to two Result definitions available.
| use tokio::net::UdpSocket; | ||
|
|
||
| use crate::bind_tun; | ||
| use crate::{bind_tun, error::Result as TelioResult}; |
There was a problem hiding this comment.
| use crate::{bind_tun, error::Result as TelioResult}; | |
| use crate::{bind_tun, error::Error}; |
There was a problem hiding this comment.
It's been changed to DnsResult as shadowing was required due to two Result definitions available.
| async fn stop(&self); | ||
| /// Configure list of forward DNS servers for zone '.'. | ||
| async fn forward(&self, to: &[IpAddr]) -> Result<(), String>; | ||
| async fn forward(&self, to: &[IpAddr]) -> TelioResult<()>; |
There was a problem hiding this comment.
| async fn forward(&self, to: &[IpAddr]) -> TelioResult<()>; | |
| async fn forward(&self, to: &[IpAddr]) -> Result<(), Error>; |
There was a problem hiding this comment.
If you really want to use the type alias here, maybe DnsResult would be a more suitable name?
There was a problem hiding this comment.
I have used DnsResult as suggested.
| const IDLE_TIME: Duration = Duration::from_secs(1); | ||
|
|
||
| #[derive(Debug, Error)] | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
#[non_exhaustive] is still there 👀
|
|
||
| /// Failed to get DNS socket local address | ||
| #[error("Failed to get DNS socket local address: {0}")] | ||
| GetDnsSocketLocalAddr(#[source] io::Error), |
There was a problem hiding this comment.
I feel like the error variants of nameserver::PacketError are much more "human readable" than these, and personally prefer that style.
mathiaspeters
left a comment
There was a problem hiding this comment.
Looks good in general, just one question asked at three different places
Done |
|
+1 |
Refactor all local telio-dns errors to use enums instead of String These errors are internal to the crate and not exposed via the public API
- Add a test to verify that AuthoritativeZone::newreturns Error::RecordOutsideZone when a record domain does not match the specified zone. - Add a test to confirm that AuthoritativeZone::new returns Ok when all record domains match the zone. - These tests ensure correct validation of zone membership during zone creation and improve negative-path coverage.
Replace string-based errors in telio-dns public API with typed enums Refactor public API to use Error enum instead of Result<_, String> Enable structured error handling and remove need for string parsing Key changes: - Introduce crate::error::Error and Result<T> alias - Update DnsResolver and NameServer to return typed errors - Remove string-based error handling from the public API - Update device.rs to use DnsResolverError(#[from] Error) and enable propagation via ? - Introduce Error::Io to be able to use #[from] DnsIoError to automaticaly distinguish between 3 different I/O errors what can happened and we don't loose ability to collect information about the real location and reason of an error. This is a breaking change within the workspace but improves type safety and API clarity
…version - Test DnsIoError display formatting for context and source error messages - Test conversion from DnsIoError to Error::Io and verify transparent error display - Ensures error formatting and conversion logic remain correct
- Test returns_record_outside_zone_error: Verifies that AuthoritativeZone::new returns an Error::RecordOutsideZone when a record's domain does not match the specified zone name. This ensures that only records belonging to the correct zone are accepted - Test returns_ok_for_valid_zone: Checks that AuthoritativeZone::new returns Ok when all record domains match the specified zone name. This confirms that valid zone records are accepted without error
Problem
The
telio-dnscrate was usingStringas the error type across both public API and internal logic. This made error handling non-idiomatic and limited:Additionally, there was no clear separation between errors that are part of the public API contract and those used internally for DNS packet processing.
Solution
Refactored error handling in
telio-dnsto use typed enums instead ofString,with a clear separation between public and internal error domains.
Key changes:
Errorenum (inerror.rs) andResult<T>aliasResult<_, String>in the public API with typed errorsDnsResolver,NameServer) and constructors to returnErrorPacketErrorenum (innameserver.rs) for DNS packet parsing and runtime errorsformat!-based error construction with structured error variantstelio-dns(device.rs) to use typed errors andFromfor propagationAdditionally:
DnsResolverError(String)withDnsResolverError(#[from] telio_dns::error::Error)to enable automatic error propagation via
?This change introduces a clean boundary between:
Error)PacketError)As a result, the API will be more robust, type-safe, and easier to extend without breaking external contracts.
To do in the future (not subject of this PR):
☑️ Definition of Done checklist