You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With rust-lang/rust#115443, developers, like those writing CLI parsers, can now perform (limited) operations on OsStr but it requires unsafe to get an OsStr back, requiring the developer to understand and follow some very specific safety notes that cannot be checked by miri.
RFC #2295 exists for improving this but its been stalled out. The assumption here is that part of the problem with that RFC is how wide its scope is and that by shrinking the scope, we can get some benefits now.
Argument parsers need to extract substrings from command line arguments. For example, --option=somefilename needs to be split into option and somefilename, and the original filename must be preserved without sanitizing it.
clap currently implementsstrip_prefix and split_once using transmute (equivalent to the stable encoded_bytes APIs).
The os_str_bytes and osstrtools crates provides high-level string operations for OS strings. os_str_bytes is in the wild mainly used to convert between raw bytes and OS strings (e.g. 1, 2, 3). osstrtools enables reasonable uses of split()to parse $PATH and replace()to fill in command line templates.
Solution sketch
Provide strs Pattern-accepting methods on &OsStr.
Defer out OsStr being used as a Pattern and OsStr indexing support which are specified in RFC #2295.
Example of methods to be added:
implOsStr{pubfncontains<'a,P>(&'aself,pat:P) -> boolwhereP:Pattern<&'aSelf>;pubfnstarts_with<'a,P>(&'aself,pat:P) -> boolwhereP:Pattern<&'aSelf>;pubfnends_with<'a,P>(&'aself,pat:P) -> boolwhereP:Pattern<&'aSelf>,P::Searcher:ReverseSearcher<&'aSelf>;pubfnfind<'a,P>(&'aself,pat:P) -> Option<usize>whereP:Pattern<&'aSelf>;pubfnrfind<'a,P>(&'aself,pat:P) -> Option<usize>whereP:Pattern<&'aSelf>,P::Searcher:ReverseSearcher<&'aSelf>;// (Note: these should return a concrete iterator type instead of `impl Trait`.// For ease of explanation the concrete type is not listed here.)pubfnsplit<'a,P>(&'aself,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>;pubfnsplit_inclusive<'a,P>(&'aself,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>;pubfnrsplit<'a,P>(&'aself,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>,P::Searcher:ReverseSearcher<&'aSelf>;pubfnsplit_terminator<'a,P>(&'aself,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>;pubfnrsplit_terminator<'a,P>(&'aself,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>,P::Searcher:ReverseSearcher<&'aSelf>;pubfnsplitn<'a,P>(&'aself,n:usize,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>;pubfnrsplitn<'a,P>(&'aself,n:usize,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>,P::Searcher:ReverseSearcher<&'aSelf>;pubfnsplit_once<'a,P>(&'aself,delimiter:P) -> Option<(&'aSelf,&'aSelf)>whereP:Pattern<&'aSelf>;pubfnrsplit_once<'a,P>(&'aself,delimiter:P) -> Option<(&'aSelf,&'aSelf)>whereP:Pattern<&'aSelf>;pubfnmatches<'a,P>(&'aself,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>;pubfnrmatches<'a,P>(&self,pat:P) -> implIterator<Item = &'aSelf>whereP:Pattern<&'aSelf>,P::Searcher:ReverseSearcher<&'aSelf>;pubfnmatch_indices<'a,P>(&self,pat:P) -> implIterator<Item = (usize,&'aSelf)>whereP:Pattern<&'aSelf>;pubfnrmatch_indices<'a,P>(&self,pat:P) -> implIterator<Item = (usize,&'aSelf)>whereP:Pattern<&'aSelf>,P::Searcher:ReverseSearcher<&'aSelf>;pubfntrim_matches<'a,P>(&'aself,pat:P) -> &'aSelfwhereP:Pattern<&'aSelf>,P::Searcher:DoubleEndedSearcher<&'aSelf>;pubfntrim_start_matches<'a,P>(&'aself,pat:P) -> &'aSelfwhereP:Pattern<&'aSelf>;pubfnstrip_prefix<'a,P>(&'aself,prefix:P) -> Option<&'aSelf>whereP:Pattern<&'aSelf>;pubfnstrip_suffix<'a,P>(&'aself,prefix:P) -> Option<&'aSelf>whereP:Pattern<&'aSelf>;pubfntrim_end_matches<'a,P>(&'aself,pat:P) -> &'aSelfwhereP:Pattern<&'aSelf>,P::Searcher:ReverseSearcher<&'aSelf>;pubfnreplace<'a,P>(&'aself,from:P,to:&'aSelf) -> Self::OwnedwhereP:Pattern<&'aSelf>;pubfnreplacen<'a,P>(&'aself,from:P,to:&'aSelf,count:usize) -> Self::OwnedwhereP:Pattern<&'aSelf>;}implPattern<&OsStr>forchar{}implPattern<&OsStr>for&str{}implPattern<&OsStr>for&String{}implPattern<&OsStr>for&[char]{}implPattern<&OsStr>for&&str{}impl<constN:usize>Pattern<&OsStr>for&[char;N]{}impl<F:FnMut(char) -> bool>Pattern<&OsStr>forF{}impl<constN:usize>Pattern<&OsStr>for[char;N]{}
This is meant to match str and if there are any changes between the writing of this ACP and implementation, the focus should be on what str has at the time of implementation (e.g. not adding a deprecated variant but the new one)
We likely want to add trim, trim_start, and trim_end to be consistent with trim_start_matches / trim_end_matches
It was decided to seal Pattern and, for now, Pattern is nightly only, allowing a lot of flexibility for how we implement OsStr support in the future (e.g. we could go as far as creating a OsPattern trait and switching to it without breaking anyone)
From an API design perspective, there is strong precedence for it
Its copying methods over from str
The design is a subset of RFC #2295 (approved) and RFC #1309 (postponed)
By deferring support for OsStr as a pattern, we bypass the main dividing point between proposals (split APIs, panic on unpaired surrogates, switching away from WTF-8)
Still requires writing higher level operations on top, but at least its without unsafe
Either takes a performance hit to be consistent across platforms or has per-platform caveats that will be similarly hard to get right for less common platforms among developers (e.g. Windows)
As far as I can tell, there isn't precedence for an API design like this meaning more new ground has to be set (naming, deciding the above preconditions, etc)
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
We think this problem seems worth solving, and the standard library might be the right place to solve it.
We think that this probably doesn't belong in the standard library.
Second, if there's a concrete solution:
We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
Proposal
Problem statement
With rust-lang/rust#115443, developers, like those writing CLI parsers, can now perform (limited) operations on
OsStrbut it requiresunsafeto get anOsStrback, requiring the developer to understand and follow some very specific safety notes that cannot be checked by miri.RFC #2295 exists for improving this but its been stalled out. The assumption here is that part of the problem with that RFC is how wide its scope is and that by shrinking the scope, we can get some benefits now.
Motivating examples or use cases
Mostly copied from #306
Argument parsers need to extract substrings from command line arguments. For example,
--option=somefilenameneeds to be split into option andsomefilename, and the original filename must be preserved without sanitizing it.clapcurrently implementsstrip_prefixandsplit_onceusing transmute (equivalent to the stableencoded_bytesAPIs).The
os_str_bytesandosstrtoolscrates provides high-level string operations for OS strings.os_str_bytesis in the wild mainly used to convert between raw bytes and OS strings (e.g. 1, 2, 3).osstrtoolsenables reasonable uses ofsplit()to parse $PATH andreplace()to fill in command line templates.Solution sketch
Provide
strsPattern-accepting methods on&OsStr.Defer out
OsStrbeing used as aPatternandOsStrindexing support which are specified in RFC #2295.Example of methods to be added:
strand if there are any changes between the writing of this ACP and implementation, the focus should be on whatstrhas at the time of implementation (e.g. not adding a deprecated variant but the new one)trim,trim_start, andtrim_endto be consistent withtrim_start_matches/trim_end_matchesThis should work because
OsStrbytes rust#109698 already established that operations on UTF-8 / 7-bit ASCII boundaries are safePatternand, for now,Patternis nightly only, allowing a lot of flexibility for how we implementOsStrsupport in the future (e.g. we could go as far as creating aOsPatterntrait and switching to it without breaking anyone)From an API design perspective, there is strong precedence for it
strOsStras a pattern, we bypass the main dividing point between proposals (split APIs, panic on unpaired surrogates, switching away from WTF-8)Alternatives
#306 proposes a
OsStr::slice_encoded_bytesunsafeLinks and related work
OsStr#306(str, OsStr)#114Patternprivate(str, OsStr)#114What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution: