Open
Conversation
Currently, the `Router` stores HTTP methods for an endpoint as `String`s. This is a bit unfortunate, as it means the already-parsed `http::Method` for each request has to be un-parsed back into a `String` to look it up in the map, allocating a new `String` each time and converting it to uppercases to perform the lookup. Switching to the typed `http::Method` representation should make things a bit more efficient (especially as `Method` also implements inline storage for extension methods up to a certain length) and provide additional type safety. This does require switching from a `BTreeMap` of strings to handlers to a `HashMap` of `Method`s to handlers, as `Method` does not implement `Ord`/`PartialOrd`, but ordering shouldn't matter here.
Presently, the router's `input_path_to_segments` function segments the whole input path, copying each each segment from the input path into a new owned `String`, and returns a `Vec<String>`. This `Vec` is then immediately iterated over by reference, and each `&String` segment is then `to_string()`ed into a _second_ owned `String` that's used when returning the list of path variables. Path segments which are not returned (i.e., literals) are still copied into a new `String` which is used only to look up that segment in the current node's map of edges, which doesn't require an owned `String` to perform. This all feels a bit unfortunate, as we are allocating path segments a bunch of times more than we need to. Ideally, we would iterate over slices borrowed from the `InputPath`, and allocate `String`s only when necessary in order to return a path variable's value in the lookup result, or when we percent-decode something. This commit makes that change. Now, `input_path_to_segments` returns an `Iterator<Item = Result<Cow<'_, str>, InputPathError>>`, which borrows the segment from the path unless it was necessary to allocate a string to percent-decode the segment. This does mean that we return an `Iterator` of `Result`s rather than a `Result` and thus have to handle a potential error every time we get the next segment, but I don't think that's terrible --- it's kind of the inherent cost of doing path segmentation lazily. This also means that if we do encounter an error, or if the path is determined not to route to an endpoint, we don't continue wasting time segmenting the rest of the path; in the present implementation, we always segment the whole path up front before traversing it. I haven't done any real performance analysis of this, but I imagine the lazy approach is more efficient; it certainly *feels* nicer. It's also less vulnerable to issues where we spend a bunch of time segmenting a really long path that will never route to anything.
Collaborator
|
Can we close this out? I don't know that removing these allocations is a priority or particularly measurable. |
Member
Author
Fine by me! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently, Dropshot's router will allocate and
Clonea bunch ofStrings during routing which it doesn't really need to be doing. This branch contains two commits which reduce these string clones and allocations:router: Reduce string copies in path traversal (bfadc03)
Presently, the router's
input_path_to_segmentsfunction segments thewhole input path, copying each each segment from the input path into a
new owned
String, and returns aVec<String>. ThisVecis thenimmediately iterated over by reference, and each
&Stringsegment isthen
to_string()ed into a second ownedStringthat's used whenreturning the list of path variables. Path segments which are not
returned (i.e., literals) are still copied into a new
Stringwhich isused only to look up that segment in the current node's map of edges,
which doesn't require an owned
Stringto perform. This all feels a bitunfortunate, as we are allocating path segments a bunch of times more
than we need to. Ideally, we would iterate over slices borrowed from the
InputPath, and allocateStrings only when necessary in order toreturn a path variable's value in the lookup result, or when we
percent-decode something.
This commit makes that change. Now,
input_path_to_segmentsreturns anIterator<Item = Result<Cow<'_, str>, InputPathError>>, which borrowsthe segment from the path unless it was necessary to allocate a string
to percent-decode the segment. This does mean that we return an
IteratorofResults rather than aResultand thus have to handle apotential error every time we get the next segment, but I don't think
that's terrible --- it's kind of the inherent cost of doing path
segmentation lazily. This also means that if we do encounter an error,
or if the path is determined not to route to an endpoint, we don't
continue wasting time segmenting the rest of the path; in the present
implementation, we always segment the whole path up front before
traversing it. I haven't done any real performance analysis of this, but
I imagine the lazy approach is more efficient; it certainly feels
nicer. It's also less vulnerable to issues where we spend a bunch of
time segmenting a really long path that will never route to anything.
router: represent methods as
http::Method(e2bbecb)Currently, the
Routerstores HTTP methods for an endpoint asStrings. This is a bit unfortunate, as it means the already-parsedhttp::Methodfor each request has to be un-parsed back into aStringto look it up in the map, allocating a new
Stringeach time andconverting it to uppercases to perform the lookup. Switching to the
typed
http::Methodrepresentation should make things a bit moreefficient (especially as
Methodalso implements inline storage forextension methods up to a certain length) and provide additional type
safety. This does require switching from a
BTreeMapof strings tohandlers to a
HashMapofMethods to handlers, asMethoddoes notimplement
Ord/PartialOrd, but ordering shouldn't matter here.Note that this is not terribly urgent, so don't rush to review it; I just thought it would be nice to do eventually. The change in e2bbecb, storing methods as
http::Methods, might be a bit nice to have when we add code for returning anAllowheader on "method not allowed" errors after #1164, but it's not a blocker.