Building a Shell - Part 3 784 words drafted on September 13, 2023.

This is the long awaited part 3 of a series I’m writing on the creation of oursh. Oursh’s goals are to be POSIX compatible, while implementing many new concepts on top of a new shell language that will make using the shell clean, easy, and safe. In part 2 we looked at how the Rust project was shaping up, including the basis of the grammar and one of the new features I’m most excited about, which will serve as the basis for the multi-language shell. I also identified a number of issues and since then have organized things in the project’s issue tracker.

Last time I promised we’d look at the command line more closely, as well as dive into the guts of a few of the underlying libraries. In many ways this is a distraction from the more important work of implementing the job runtime, which is something I’ve been putting off. Perhaps, with a bit of momentum again, part 4 can focus solely on a possible async refactor, or just simply fixing background jobs (#6).

For now, we’ll start with how docopt is used to describe the command line arguments. We’ll answer questions around PR#64 and most notably, we’ll decide if docopt is still the best crate for our use case.

Next we’ll look at how termion is used to manipulate the terminal and look at a few features and flags that depend on these capabilities. As part of this, we’ll aim to finish PR#68, which will be a somewhat major refactor of the repl module. This should pave the way for completing the main REPL interface (#5) moving forward.

Finally, we’ll take a brief dive into nix, which is mainly used under the hood of many of our dependencies and OS integrations.

Command Line Arguments

The goal of the docopt project is to build a fully functional command line parser from the USAGE string itself. This is a noble and lofty goal, which ensures that the --help information is kept in sync with the project, because it defines the arguments themselves.

I was originally going to write up a comparison of docopt and clap in the hopes that I would be able to choose between them more easily. However, after thinking through the requirements of the command line argument parser a bit more, I’m starting to think it might actually be easier to write my own, even possibly leveraging LALRPOP yet again! That would be really cool actually. So first, let’s look at what exactly we’re trying to parse. Starting with the bash invocation description online. Here’s the synopsis:

bash [long-opt] [-ir] [-abefhkmnptuvxdBCDHP] [-o option]
    [-O shopt_option] [argument …]

bash [long-opt] [-abefhkmnptuvxdBCDHP] [-o option]
    [-O shopt_option] -c string [argument …]

bash [long-opt] -s [-abefhkmnptuvxdBCDHP] [-o option]
    [-O shopt_option] [argument …]

Let’s see… First we notice that [long-opt] is always parsed first, before anything else. That should be easy to make consistent across the three invocation forms. We can also see that the set of single letter flags are the same, followed by [-o option] [-O shopt_option] each time.

So the general syntax is: first parse long options (e.g. --init-file filename) and the -ir and -s flags, then parse short set options (e.g. -x, an option I use a lot to debug/trace my shell programs) and then the complete -+o options for set. Next, we parse options for shopt itself with -O. Finally, we parse -c string and argument ... which are passed through to the running program in $@.

Here are a few examples to help make the semantics more clear:

# Run a shell script from file with an argument.
oursh /path/to/ 123
# Equivalent to running `oursh` without arguments.
oursh -s
# Run the shell with a couple arguments.
oursh -s 123 456
# Run a command string with an argument.
oursh -c "echo $@" hello world

This shouldn’t be hard to write a grammar for…

Terminal Features

You nix?

What about windows?