uutils / uutils-args Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 3.0 345 KB

An experimental derive-based argument parser specifically for uutils

License: MIT License

Rust 99.92% Just 0.08%

uutils-args's Introduction

uutils-args

Argument parsing for the uutils coreutils project.

It is designed to be flexible, while providing default behaviour that aligns with GNU coreutils.

Features

A derive macro for declarative argument definition.
Automatic help generation.
Positional and optional arguments.
Automatically parsing values into Rust types.
Define a custom exit code on errors.
Automatically accept unambiguous abbreviations of long options.
Handles invalid UTF-8 gracefully.

When you should not use this library

The goal of this library is to make it easy to build applications that mimic the behaviour of the GNU coreutils. There are other applications that have similar behaviour, which are C application that use getopt and getopt_long. If you want to mimic that behaviour exactly, this is the library for you. If you want to write basically anything else, you should probably pick another argument parser (for example: clap).

Getting Started

Parsing with this library consists of two "phases". In the first phase, the arguments are mapped to an iterator of an enum implementing [Arguments]. The second phase is mapping these arguments onto a struct implementing [Options]. By defining your arguments this way, there is a clear divide between the public API and the internal representation of the settings of your app.

For more information on these traits, see their respective documentation:

[Arguments]
[Options]

Below is a minimal example of a full CLI application using this library.

use uutils_args::{Arguments, Options};

#[derive(Arguments)]
enum Arg {
    // The doc strings below will be part of the `--help` text
    // First we define a simple flag:
    /// Transform input text to uppercase
    #[arg("-c", "--caps")]
    Caps,

    // This option takes a value:
    /// Add exclamation marks to output
    #[arg("-e N", "--exclaim=N")]
    ExclamationMarks(u8),
}

#[derive(Default)]
struct Settings {
    caps: bool,
    exclamation_marks: u8,
    text: String,
}

// To implement `Options`, we only need to provide the `apply` method.
// The `parse` method will be automatically generated.
impl Options<Arg> for Settings {
    fn apply(&mut self, arg: Arg) {
        match arg {
            Arg::Caps => self.caps = true,
            Arg::ExclamationMarks(n) => self.exclamation_marks += n,
        }
    }
}

fn run(args: &[&str]) -> String {
    let (s, operands) = Settings::default().parse(args).unwrap();
    let text = operands.iter().map(|s| s.to_string_lossy()).collect::<Vec<_>>().join(" ");
    let mut output = if s.caps {
        text.to_uppercase()
    } else {
        text
    };
    for i in 0..s.exclamation_marks {
        output.push('!');
    }
    output
}

// The first argument is the binary name. In this example it's ignored.
assert_eq!(run(&["shout", "hello"]), "hello");
assert_eq!(run(&["shout", "-e3", "hello"]), "hello!!!");
assert_eq!(run(&["shout", "-e", "3", "hello"]), "hello!!!");
assert_eq!(run(&["shout", "--caps", "hello"]), "HELLO");
assert_eq!(run(&["shout", "-e3", "-c", "hello"]), "HELLO!!!");
assert_eq!(run(&["shout", "-e3", "-c", "hello", "world"]), "HELLO WORLD!!!");

Value parsing

To make it easier to implement [Arguments] and [Options], there is the [Value] trait, which allows for easy parsing from OsStr to any type implementing [Value]. This crate also provides a derive macro for this trait.

Examples

The following files contain examples of commands defined with uutils_args:

uutils-args's People

Contributors

Stargazers

Watchers

Forkers

tertsdiepraam cakebaker benwiederhake

uutils-args's Issues

Markdown rendering

Markdown rendering was removed in tertsdiepraam@a6d9e5b

Ultimately we want to put it back in, but we need to design how it should look first.

`dd` style arguments should be supported in `man` and `md`

Currently, we only support short and long flags in uutils-args-complete. dd-style arguments should be added to this.

Manpage generation

Like clap we should be able to create manpages automatically.

Positional arguments redo

I'm not happy with positional arguments at the moment. They do not fit nicely with the rest of the library and have a weird API. We need to reassess how they should work.

The problems are:

The positional arguments do not really fit in an enum.
The distribution of positional arguments cannot be dependent on options (see for instance what cp requires).
The ranges are unintuitive and have no "obvious" default value.

Get help from file

I want the help text to be generated from a markdown file.

It might look something like this:

#[derive(Clone, Arguments)]
#[help("--help", file = "some_file.md")]
enum Arg {}

The file then contains somthing of this shape:

# BINNAME

## Usage
```
bin [options] [args]
```

## Summary
Some description with **markdown syntax**

In the process, the default help flag should be removed from the API (so that it's always possible to grep the codebase for the implementation of a specific option, including --help and --version).

Deprecated numeric arguments for `head`, `tail` and `uniq`

These utils have a weird old argument style where numbers starting with - (and in case of tail also +) are parsed as value to some flag. Unfortunately, I think these will just have to be special-cased in the library.

Infer value names

Value names should be able to be inferred from unambiguous prefixes, just like long arguments. So we need a similar construction for them as we have in the long arguments.

`elvish` completion

We have zsh and fish but we need elvish completions too!

`default` option to `arg` is confusing

It is currently possible to write this for an arguments that don't take values:

enum Arg {
   #[arg("-f", default = 8)]
   Foo(u8),
}

The word default is used because it is the value that's used if the argument does not get an argument. However, the argument does not take an argument at all, so it does not make a lot of sense. A more intuitive name in this case would be value.

However, it's not that simple. The value of default can be used by multiple flags:

enum Arg {
   #[arg("-f", "--foo=BAR", default = 8)]
   Foo(u8),
}

It would be annoying to split these definitions (and repeat the help text in --help).

We could say that value has to be used if none of the flags take arguments and default otherwise, but that's also over-complicating things. Maybe the word value works for both cases? This doesn't look too bad:

enum Arg {
   #[arg("-f", "--foo=BAR", value = 8)]
   Foo(u8),
}

Trailing var args

We'll need support for something equivalent to clap's trailing var arg. There are multiple options for the syntax. We could decide to only make it have 1.. as number of arguments or make that configurable, which is probably best in the long run. I like calling it last too, instead of trailing, but I'm not sure yet.

enum Arg {
    #[positional(trailing),
    Trailing(String),

    #[positional(last)]
    Trailing(String),

    #[positional(last, 1..)]
    Trailing(String)
}

`term_md` should be implementing `Display` instead of rendering directly to a `String`

Options::apply should permit returning an error

This is a bit ugly, but sometimes the error while parsing an argument seems to take precedence over anything (including other errors) that might be encountered later:

$ date -R -R --help
date: multiple output formats specified
[$? = 1]

See also uutils/coreutils#4254 (comment) for further examples.

I propose changing the interface to:

pub trait Options<Arg: Arguments>: Sized {
    /// Apply a single argument to the options.
    fn apply(&mut self, arg: Arg) -> Result<(), Error>;

    // Rest basically unchanged; the default `parse` impl would simply call `parse()?` instead.
}

What do you think?

Error messages

The error messages should have a nice format and provide useful information. They should also include some suggestions, for example, "--revrse does not exist did you mean --reverse?"

The general format can be much like clap's current output. It might be nice to provide an option for terser messages, but until there's a demand for that, I don't have a use case for that.

Ideally, we would support a full miette style, where you get to see the command with the part where the error occured highlighted, but that's out of scope for this issue.

Make positional arguments more ergonomic

Recently, I simplified the handling of positional arguments so that there is always a Vec with all positional arguments. While I think that was the right choice, we can still make that nicer to work with. In this issue, I want to explore that a bit.

Let's start with the status quo. The main function in most apps now looks something like this:

let (settings, operands) = Settings::default().parse()?;

However, those operands still need to be unpacked to what the util expects:

arch doesn't accept any positional arguments.
base32 only accepts a single positional argument.
cat accepts any number of positional arguments.
cp needs to separate the destination from the sources.
join has 2 optional arguments

So how do we do that?

Method 1: Matching

// arch
if !operands.is_empty() {
    // some error
}

// base32
let file = match &operands {
    [] => return Err(/* not enough arguments */),
    [file] => file,
    _ => return Err(/* too many arguments */),
}

// cat
if operands.is_empty() {
    operands.push(OsString::from("-"));
}

// cp (more complicated in reality because it depends on options)
let (sources, dest) = match &operands {
    [] => return Err(/* missing sources */),
    [_] => return Err(/* missing destination */),
    [sources@.., dest] => (sources, dest),
}

This is pretty nice, but it means that the errors are entirely the responsibility of the utility, with no help from this library. It's also easy to forget to check the operands in arch, when you do not need them. The second arm of the match expression is also interesting, because Rust will nog force us to include it.

Method 2: An `Operands` Type

We could instead define a wrapper around Vec called Operands:

// arch
operands.empty()?;

// base32
let file = operands.pop_front("FILE")?;
operands.empty()?;

// cat
let files = operands.to_vec();
if files.is_empty() {
    files.push(OsString::from("-"));
}

// cp
let destination = operands.pop_back("DEST")?;
let sources = operands.to_non_empty_vec("SOURCES")?;

// join
let file1 = operands.pop_front("FILE1").ok();
let file2 = operands.pop_front("FILE2").ok();
operands.empty()?;

This is fairly concise and could provide pretty good error messages out of the box. It's not very declarative though. It would also benefit from linear types, which we don't have unfortunately.

Method 3: Include all the possibilities!

So, there's an advantage of building a library for a specific set of utilities: we can figure out exactly what we need! What if we provide a method for every possible configuration of operands? In fact, we could do that based on the type, much like how parse works in the standard library.

// arch
let _: () = operands.unpack()?;

// base32
let file: PathBuf = operands.unpack()?;

// cat
let files: &[PathBuf] = operands.unpack()?;

// cp
let (sources, dest): (&[PathBuf], PathBuf) = operands.unpack()?;

// join
let (file1, file2): (Option<PathBuf>, Option<PathBuf>) = operands.unpack()?;

However, there is a question as to how we differentiate between slices that may be empty and slices that cannot be empty. It's also a challenge to include as many possibilities as possible without having to write every single one. On the other hand, there are also not that many combinations that make sense.

Another important open question: how do we get the argument names into the error messages? Presumably, I would need to be something like this:

let (sources, dest): (&[PathBuf], PathBuf) = operands.unpack(("SOURCES", "DEST"))?;

The signature for that is gonna get ugly, but it looks kinda nice when used 😄

Method 4: Declarative Macro to the Rescue?

This could also be provided as a macro:

let (sources, dest) = unpack!("SOURCES... DEST", operands)?;

This looks even nicer, but does add a lot of additional complexity. It might be possible to make this a declarative macro with a different syntax though:

let (sources, dest) = unpack!(operands, SOURCES.., DEST);

Let's think about that last one. First, we need some types:

struct Required(&'static str);
struct Optional(&'static str);
struct ZeroOrMore(&'static str);
struct OneOrMore(&'static str);

Then:

unpack!(operands, SOURCES.., DEST)
// expands to
(OneOrMore("SOURCES"), Required("DEST")).unpack(operands)

And then we implement Unpack in the library for all combinations we need

impl Unpack for (OneOrMore, Required) {
    type Output = (&[OsString], OsString);
    fn unpack(&self, operands: Vec<OsString>) -> Result<Self::Output> {
        // ..
    }
}

Rename all the crates to something that crates.io will accept

Currently the derive crate is just called derive, which is obviously already taken on crates.io.

Probably:

uutils-args
uutils-args-proc
term-md

But maybe we want to rename to something else first? I don't really care about the name for term-md because it's probably not gonna be used directly, but we're gonna see the name for uutils-args a lot, so it might as well be something nicer.

Some cuter names instead of uutils-args:

coreopt, crop, corrode, ~~coral~~ (already taken), corinth, corde, cordial
corde could be coreutils deserializer or something.
optics, optical
clup, club
urge
uopt, uargs, ucli
I kinda like nucleus, because it has the "u" and "cl" for command line but also it means "core" from coreutils. It's taken on crates.io though. Maybe "nucli"? Gets maybe too close to the nushell things.
Fun words that start with "u" and are still available: umph, umpteen, upfront, upright, uptight, uranium (also ties in with the previous idea: radioactive -> nuclear)
enamor is nice too, ties in with the fact that this library uses enums instead of structs to derive from.
copter

I like the names with a nice connotation: enamor, corde, cordial, etc.

Re: Problems with `clap` and other parsers

bpaf author here, always looking to improve the library :)

Problem 1: No many-to-many relationship between arguments and settings

This is the biggest issue we have with clap. In clap, it is assumed that options do not interfere with each other. This means that partially overriding options are really hard to support.

My rm-fu is insufficient to know the difference between two invocations given below, but in bpaf interaction between options is possible. If you want to apply different rules for --interactive=never iff -f is present you can do something like this:

// define two variants of `--interactive` parsers, parsed types must be the same but for purposes of this example there's no other restrictions

let int_w = ... // impl Parser<Interactive>, parser to use where `--force` is present
let int_wo = ... // impl Parser<Interactive>, parser to use otherwise

let force = short('f').req_flag(true); // impl Parser<bool>, succeeds only when `-f` is passed
let with = construct!(force, interactive_with); // impl Parser<bool, Interactive>
let no_force = pure(false);
let without = construct!(no_force, int_wo); // impl Parser<bool, Interactive>, ignores `-f` and will succeed for as long as `int_wo` succeeds

// this parser will use different parsers for `--interactive` depending on if `--force` is present or not.
let combined = construct!([with, without]); // impl Parser<bool, Interactive>

Is something missing here?

Problem 2: We can't use the derive API
but that feels overly complicated.

I usually mix derive and combinatoric APIs (derive API in bpaf is just some syntactic sugar for combinatoric API anyway), a good example is here: https://github.com/pacak/cargo-show-asm/blob/master/src/opts.rs

Problem 4: Subtle differences

But even then, there is no way to tell clap to consider the = as part of the value.

any + anywhere + parse gives you a chance to parse anything anywhere. If you have multiple such fields you can just define your own primitive - see example for dd that defines tag function.

Problem 5: Deprecated syntax of head, tail and uniq

e.g. -5 is short for -s 5

Same any + anywhere + parse, you can combine it alternatively with a parser that accepts plain -s 5 so application gets just one value for input.

Problem 6: Exit codes

In coreutils, different utils have different exit codes when they fail to parse.

Args::from + OptionParser::run_inner - any exit code should be possible.

Problem 6: It's stringly typed

Not bpaf :)

Problem 7: Reading help string from a file

In combinatoric API .help is a plain function that takes anything that can be converted to a string.

Problem 8: No markdown support

That's interestring, I'm implementing support for generating manpages and markdown documentation from right now...

Other parsers
bpaf

Very configurable, even supports dd-style.

Interestingly enough I don't see it as configurable but flexible. There's no predefined configurations, you create parsers from available primitives using applicatives and the limit is only how far you can get with applicative/alternative functors (pretty far).

No different configuration between short and long options (as far as I can find).

Can you expand a bit on this?

Does not have the many-to-many relationship (options map directly to fields in a struct).

Options map to a computation tree. I tried to give an example of my understanding of a problem. If that's my misunderstanding - I can give one more go :)

`powershell` completions

We have zsh and fish but we need powershell completions too!

Remove `let ... else`

It's supported, but breaks rustfmt and raises the MSRV too much. We should target 1.60 for now (the same as uutils).

Errors cannot be tested: ErrorKind implements neither Debug nor PartialEq

The tests currently do not check negative inputs, and would not detect if the library is buggy by accepting too much, e.g. basically never throwing an error. This is because doing so is not really feasible: The "obvious" way to test error scenarios is to assert that the returned error equals some expected error. However, that cannot work, because ErrorKind implements neither Debug nor PartialEq.

Implementing Debug is an easy fix:

diff --git a/src/error.rs b/src/error.rs
index 0da36e5..18c9d27 100644
--- a/src/error.rs
+++ b/src/error.rs
@@ -66,7 +66,7 @@ impl StdError for Error {}
 
 impl Display for Error {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        self.kind.fmt(f)
+        std::fmt::Display::fmt(&self.kind, f)
     }
 }
 
@@ -139,6 +139,12 @@ impl Display for ErrorKind {
     }
 }
 
+impl Debug for ErrorKind {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        std::fmt::Display::fmt(self, f)
+    }
+}
+
 impl From<lexopt::Error> for ErrorKind {
     fn from(other: lexopt::Error) -> ErrorKind {
         match other {

Implementing PartialEq is impossible however, because of ParsingFailed { … dyn StdError } and IoError(std::io::Error). Note that [std::io::ErrorKind](https://doc.rust-lang.org/std/io/enum.ErrorKind.html#impl-PartialEq-for-ErrorKind) and [std::num::IntErrorKind](https://doc.rust-lang.org/std/num/enum.IntErrorKind.html#impl-PartialEq-for-IntErrorKind) already implement PartialEq.

I have no good idea how to approach this, only bad ones:

Alternate enum: A new method fn ErrorKind::to_pure(&self) -> PureErrorKind;, which converts the current ErrorKind to a dataless enum PureErrorKind which has PartialEq.
Hardcode the error message in the tests: I hate it. It would work though.
Somehow moving the data to Error, making it easy for the current ErrorKind to implement PartialEq.

`try_parse` should be the default

We never actually want this library to exit by itself. It should always just return an error.

`bash` completion

We have zsh and fish but we need bash completions too!

Hidden flags

We need to hide some flags in the help string that are just for testing purposes in GNU for full compatibility.

I think it's easiest to just add a hidden keyword:

#[derive(Clone, Argument)]
enum Arg {
    #[option("---presume-input-pipe", hidden)
    PresumeInputPipe,
}

`nushell` completions

We have zsh and fish but we need nushell completions too!

Rework the `Initial` derive macro

The Initial trait was extracted from the Options trait and its derive macro is worded too generally. Instead of

#[derive(Initial)]
struct Foo {
    #[field(default = exp1, env = "VAR")]
    field: Type,
}

it should probably just be

#[derive(Initial)]
struct Foo {
    #[initial(exp1, env="VAR")]
    field: Type
}

While we're at it, the macro should also work for enums. It would probably be just like the Default trait, but with the ability to specify the fields for the initial variant:

#[derive(Initial)]
enum Foo {
   #[initial(6)]
   Variant(usize)
}

`Value` derive macro without `value` attributes should have a better error

Currently, it says: "expected pattern", which is not great.

Exit codes

We need to support a setting for the exit code of errors reported by uutils_args. Probably something like this:

#[derive(Clone, Arguments)]
#[exit_code(125)]
enum Arg {
    ...
}

It might be nice to combine this with the help and version attributes somehow (renaming Options to Parser in the process):

#[derive(Clone, Arguments)]
#[arguments(
    help = ["--help"],
    version = ["--version"],
    exit_code = 125,
    help_file = "some_file.md"
)]
enum Arg {
    ...
}

Guide-level documentation

We need a guide for using this, because all the attributes are naturally not automatically documented by rustdoc.

Ideally, this would look very much like clap's tutorials and references:

Instead of rustdoc, I want these to be hosted on a mdbook site. They should also include a little cookbook with several types of common arguments.

`fish` completions should know whether an argument to a flag is required

The complete command in fish has a -r flag which specifies that an argument is required. If we have Value::Required on a flag, that should be used. This requires grouping the flags first by whether they have a required, optional, or no value.

Helper functions should be grouped in a module

There are some functions that need to be public but somewhat hidden because they should only be used by the derive macro. These currently just live in lib.rs. This is unfortunate because they clutter the documentation. I don't think we should hide them altogether, but we could group them in a module that's clearly documented as being "private".

Arguments imply other arguments

The many-to-many relationship of arguments and settings is currently handled entirely in the struct, but we could take some of the heavy lifting into the arguments. This will prevent that we forget to handle arguments in the struct.

My proposal is that arguments can imply other arguments, which will be added to the iterator after the initial argument. This allows us to rewrite some arguments with multiple effects as expanding into multiple smaller arguments. Take this set of arguments from cat for example:

#[derive(Clone, Arguments)]
enum Arg {
    #[arg("-A", "--show-all")]
    ShowAll,

    #[arg("-e")]
    ShowNonPrintingEnds,

    #[arg("-E")]
    ShowEnds,

    #[arg("-t")]
    ShowNonPrintingTabs,

    #[arg("-T", "--show-tabs")]
    ShowTabs,

    #[arg("-v", "--show-nonprinting")]
    ShowNonPrinting,
}

#[derive(Initial)]
struct Settings {
    show_tabs: bool,
    show_ends: bool,
    show_nonprinting: bool,
}

impl Options for Settings {
    type Arg =  Arg;
    fn apply(&mut self, arg: Arg) {
        if let Arg::ShowAll | Arg::ShowNonPrintingTabs | Arg::ShowTabs = arg {
            self.show_tabs = true;
        }
        if let Arg::ShowAll | Arg::ShowNonPrintingEnds | Arg::ShowEnds = arg {
            self.show_ends = true;
        }
        if let Arg::ShowAll | Arg::ShowNonPrintingTabs | Arg::ShowNonPrintingEnds | Arg::ShowNonPrinting = arg {
            self.show_non_printing = true;
        } 
    }
}

This could be rewritten as:

#[derive(Clone, Arguments)]
enum Arg {
    #[option("-A", "--show-all", implies = [Arg::ShowEnds, Arg::ShowTabs, Arg::ShowNonPrinting])]
    ShowAll,

    #[option("-e", implies = [Arg::ShowEnds, Arg::ShowNonPrinting])]
    ShowNonPrintingEnds,

    #[option("-E")]
    ShowEnds,

    #[option("-t", implies = [Arg::ShowEnds, Arg::ShowNonPrinting])]
    ShowNonPrintingTabs,

    #[option("-T", "--show-tabs")]
    ShowTabs,

    #[option("-v", "--show-nonprinting")]
    ShowNonPrinting,
}

#[derive(Initial)]
struct Settings {
    show_tabs: bool,
    show_ends: bool,
    show_nonprinting: bool,
}


impl Options for Settings {
    type Arg =  Arg;
    fn apply(&mut self, arg: Arg) {
        match arg {
            Arg::ShowTabs => self.show_tabs = true,
            Arg::ShowEnds => self.show_ends = true,
            Arg::ShowNonPrinting => self.show_nonprinting = true,
        }
    }
}

Open questions:

Should this be applied recursively? E.g. could ShowAll have implies = [Arg::ShowNonPrintingEnds, Arg::ShowTabs]. I think it shouldn't to prevent loops and to keep the implementation simpler.
Should the original argument be preserved?

Use cases:

-A, -e and -t in cat.
--zero, -o, -g, -n in ls
-s in basename
-d and -a in cp
-F in tail
-a in uname
-a in who

Argument Completion

This library should have support for argument completion. This consists of a couple of steps:

There should be value hints.
The configuration of the arguments should be output to the OUT_DIR. Ideally, this includes things like the possible values for enums that implement FromValue.
That configuration can be read by some build script or something like that to actually generate the completions.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

cargo

Cargo.toml

strsim 0.11.1

lexopt 0.3.0

complete/Cargo.toml

roff 0.2.1

derive/Cargo.toml

proc-macro2 1.0.81

quote 1.0.36

syn 2.0.60

github-actions

.github/workflows/ci.yml

actions/checkout v4

Swatinem/rust-cache v2

actions/checkout v4

Swatinem/rust-cache v2

actions/checkout v4

Swatinem/rust-cache v2

actions/checkout v4

Swatinem/rust-cache v2

Check this box to trigger a request for Renovate to run again on this repository

Make `Arguments` composable and reuseable

Many of the coreutils share arguments, possibly with some slight variations. Some examples:

head & tail
cksum, sum, b2sum, sha1sum, etc.
base32, base64, basenc
mv, cp

We could copy-paste the Arguments enums for each of these, but I think we can do better.

Essentially, it must be possible to compose multiple Arguments enums. Let's first establish what's currently possible. Given two enums Arg1 and Arg2, we could make a new enum Arg for which the Arg::next_arg calls Arg1::next_arg and Arg2::next_arg in order. That works somewhat but has a problem with abbreviations of long options, because Arg1::next_arg and Arg2::next_arg do not know about each other's long arguments. They also don't know about each other's positional indices.

So we have to break up the trait into multiple methods. One for each of the following operations, which can be composed:

Parse short option.
Get an iterator of long options.
Parse long option.
Get the maximum number of positional arguments.
Parse positional argument.
Get the help text per argument (so we can sort it)
(And parsing free arguments once those are implemented)

The next_arg method can then be provided based on these, possibly only implemented by ArgumentsIter.

That's all the internal plumbing that needs to change, but what does it look like in the API?

One option is to provide a macro that creates an automatic implementation:

enum Arg1 { ... }
enum Arg2 { ... }

compose_args!(Arg, [Arg1, Arg2])

That works but I think I prefer supporting it via the derive macro instead, which makes it easier how it's structured:

#[derive(Arguments)]
enum Arg {
    #[include] A(Arg1),
    #[include] B(Arg2),

    #[arg("-a")]
    All,
}

This also provides a nice way of grouping options in --help. A variation on this design makes a distinction between an ArgumentGroup and Arguments, where the former can be included into the latter. This makes sense, because Arguments have, for instance, about text and version info, whereas ArgumentGroup doesn't need that. Additionally, we can restrict ArgumentGroup to not include positional arguments, because those get hard to reason about (and hard to implement in proc macros).

Taking this further can lead to some pretty cool stuff, by combining ArgumentGroup with Value. For example, we can do this:

#[derive(ArgumentGroup, Value)]
enum Format {
    #[arg("--long")]
    #[value("long")]
    Long,
    #[value("columns")]
    Columns,
}

#[derive(Arguments)]
enum Arg {
    #[include]
    #[arg("--format=FORMAT")]
    Format(Format),
}

We could also allow the --format argument to be defined like this:

#[derive(ArgumentGroup, Value)]
#[option("--format=FORMAT")]
enum Format { ... }

Default values and values from env variables

The only way to set default values in the Options struct currently, is to create a custom Default implementation. This might be a bit cumbersome and does not support environment variables.

The semantics should be as follows:

If any value is mapped to a field during parsing, that value is used,
Else if an env argument is set, then a value is parsed from that environment variable,
Else if a default is passed, that expression is used as the default,
Else Default::default() is used.

We might need an env_parser option as well to allow different parsing from the variable than the command line.

Option 1

#[derive(Default, Options)]
struct Settings {
    #[map(Arg::Foo => true)]
    foo: bool,
    
    #[map(Arg::Bar => false)]
    #[field(default = true, env = "SOME_ENV_VAR")]
    bar: bool,
}

Option 2

#[derive(Default, Options)]
struct Settings {
    #[map(Arg::Foo => true)]
    foo: bool,
    
    #[map(Arg::Bar => false)]
    #[default(true)]
    #[env("SOME_ENV_VAR")]
    bar: bool,
}

Option 3

#[derive(Default, Options)]
struct Settings {
    #[map(Arg::Foo => true)]
    foo: bool,
    
    #[map(
        Arg::Bar => false,
        default => true,
        Env("SOME_ENV_VAR", x) => x,
    )]
    bar: bool,
}

Option 4

#[derive(Default, Options)]
struct Settings {
    #[field(map(Arg::Foo => true))]
    foo: bool,
    
    #[field(
        map(Arg::Bar => false),
        default = true,
        env = "SOME_ENV_VAR",
    )]
    bar: bool,
}

Conflicting options

It doesn't occur often in coreutils, but sometimes options conflict. They might even conflict with themselves. This is tricky to support while also maintaining good error messages.

One design could be to define conflict groups (leaving out some boilerplate in the example):

enum Arg {
    #[arg("--foo", conflict_group="foobar")]
    Foo,
    #[arg("--bar", conflict_group="foobar")]
    Bar,
}

"foobar" is a key into some hashset of options that have been set.

An example of conflicting options are cut's --fields, --bytes and --characters.

Localization

This library has the potential to have localization built-in. Currently, all strings are hardcoded in this library, especially in the derive part, but with localized strings, they have to live in some data structure providing strings per language.

The "obvious" crate for localization is fluent.

To tackle the design there are several issues that we need to address:

Localized description from the markdown file.
Localized help string for arguments.
Localized error messages.

Markdown file

Recall that a command is documented in a markdown file, which is referenced like this:

#[derive(Arguments)]
#[arguments(file = "some/path/to/the/help/file.md")]
enum Arg { ... }

That works great for a single file, but we might need some more flexibility.

Instead, we need some regex-like / glob-like pattern:

#[derive(Arguments)]
#[arguments(file = "some/path/to/the/help/file-{LOCALE}.md")]
enum Arg { ... }

#[derive(Arguments)]
#[arguments(file = "some/{LOCALE}/path/to/the/help/file.md")]
enum Arg { ... }

Every file with that pattern would be included in the binary and the right text would be selected at runtime.

Options

The help text for the options could be written in fluent files. Ideally, this would be parsed at compile-time, somehow.

Errors

Errors would also need to be written in fluent. This goes for all errors in the coreutils, but also for the errors produced by this library.

Deal with lists of values

There are a few options in coreutils which accept lists of values, e.g.

cut --fields
chroot --groups

Sometimes these are additive, sometimes they override and sometimes they can only be passed once.

We need to figure out a way to deal with these case. The overriding case is easy (just parse into a vec and override the vec like any other value). Additive and only once are harder cases.