Add support for non-UTF-8 strings #46
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a start towards allowing arguments that aren't UTF-8. At the moment there's just the groundwork to show that argument parsing is also possible based on
&OsStr. It's not finished yet, but I'd like to get some feedback and discuss some design decisions beforehand. The API ofgumdropitself has to change quite a bit, but I think users that just usederive(Options)shouldn't need to notice much of the change.Related to #15.
What becomes
&OsStrorAsRef<OsStr>?The parser accepts now
AsRef<OsStr>instead ofAsRef<str>, but normal string (slices) implement this too, so that isn't too big of a change. All commands, short/long options stay&str, but free arguments and arguments to long options become&OsStrbefore being parsed ingumdrop_derive.The functions
gumdrop::{parse_args, parse_args_default}currently accept&[String]. I think this either can be kept the same but accepting&[S] where S: AsRef<OsStr>or two new functionsgumdrop::{parse_args_os, parse_args_os_default}that accept&[OsString]can be made.In
gumdrop_derivethe parsing options based on&strcan remain, and the arguments can be converted from&OsStrto&strfirst, yielding an error if that's not possible (without this PR, this would yield a panic). For this purpose I made a doc-hidden but public functiongumdrop::to_str, I'm not sure if that's the best approach. In addition to parsing based on&str, parsing options based on&OsStrcan be added (basically the whole point of this PR):parse(from_os_str = "...")forfn(&OsStr) -> Tparse(try_from_os_str = "...")forfn(&OsStr) -> Result<T, E> where E: Displayparse(from_os_str)usesstd::convert::From::fromparse(try_from_str)has a goodOsStrequivalentLimitations
If available, platform-specific
OsStrExtis used to look ahead and parse arguments starting from the beginning. Otherwise, the only API available to get the inner data of theOsStristo_string_lossy, which checks the entire string for valid UTF-8 and allocates aStringif that isn't the case, replacing invalid UTF-8 with the replacement character. This means an option like--valid-utf8=invalid-utf8cannot be parsed intoLongWithArg(&str, &OsStr), because there is no way to shorten the&OsStrtoinvalid-utf8while keeping the invalid UTF-8 characters. However,--valid-utf8 invalid-utf8(with a space instead of equal sign) can be parsed correctly on all platforms, and most platforms (especially in terms of usage) implement a form ofOsStrExt.At the moment I crudely check with
cfg(unix)for unix's version ofOsStrExt, but this should be extended to other platforms that export the sameOsStrExtbut not all other unix stuff.Windows has a different
OsStrExttrait, which cannot turn--valid-utf8=invalid-utf8intoLongWithArg(&str, &OsStr), but can turn it intoLongWithArg(Cow<str>, Cow<OsStr>). This is because Windows uses 16-bit code points internally, which translates to that you can't shorten a&OsStr, but you can create a newOsStringthat contains only part of the original&OsStr.