Building a Shell - Part 3 784 words drafted on September 13, 2023.
This is the long awaited part 3 of a series I’m writing on the creation of
oursh
. Oursh’s goals are to be POSIX compatible, while implementing
many new concepts on top of a new shell language that will make using the shell
clean, easy, and safe. In part 2 we looked at how the Rust project was
shaping up, including the basis of the grammar and one of the new features I’m
most excited about, which will serve as the basis for the multi-language shell.
I also identified a number of issues and since then have organized things in
the project’s issue tracker.
Last time I promised we’d look at the command line more closely, as well as
dive into the guts of a few of the underlying libraries. In many ways this is a
distraction from the more important work of implementing the job runtime, which
is something I’ve been putting off. Perhaps, with a bit of momentum again,
part 4 can focus solely on a possible async
refactor, or just simply fixing
background jobs (#6).
For now, we’ll start with how docopt
is used to describe the command
line arguments. We’ll answer questions around PR#64 and most notably,
we’ll decide if docopt
is still the best crate for our use case.
Next we’ll look at how termion
is used to manipulate the terminal
and look at a few features
and flags that depend on these capabilities. As
part of this, we’ll aim to finish PR#68, which will be a somewhat major
refactor of the repl
module. This should pave the way for completing
the main REPL interface (#5) moving forward.
Finally, we’ll take a brief dive into nix
, which is mainly used under
the hood of many of our dependencies and OS integrations.
Command Line Arguments
The goal of the docopt
project is to build a fully functional
command line parser from the USAGE
string itself. This is a noble and lofty
goal, which ensures that the --help
information is kept in sync with the
project, because it defines the arguments themselves.
I was originally going to write up a comparison of docopt
and clap
in the
hopes that I would be able to choose between them more easily. However, after
thinking through the requirements of the command line argument parser a bit
more, I’m starting to think it might actually be easier to write my own, even
possibly leveraging LALRPOP yet again! That would be really cool actually. So
first, let’s look at what exactly we’re trying to parse. Starting with the
bash
invocation description online. Here’s the synopsis:
bash [long-opt] [-ir] [-abefhkmnptuvxdBCDHP] [-o option]
[-O shopt_option] [argument …]
bash [long-opt] [-abefhkmnptuvxdBCDHP] [-o option]
[-O shopt_option] -c string [argument …]
bash [long-opt] -s [-abefhkmnptuvxdBCDHP] [-o option]
[-O shopt_option] [argument …]
Let’s see… First we notice that [long-opt]
is always parsed first, before
anything else. That should be easy to make consistent across the three
invocation forms. We can also see that the set of single letter flags are the
same, followed by [-o option] [-O shopt_option]
each time.
So the general syntax is: first parse long options (e.g. --init-file
filename
) and the -ir
and -s
flags, then parse short set
options (e.g.
-x
, an option I use a lot to debug/trace my shell programs) and then the
complete -+o
options for set
. Next, we parse options for shopt
itself
with -O
. Finally, we parse -c string
and argument ...
which are passed
through to the running program in $@
.
Here are a few examples to help make the semantics more clear:
# Run a shell script from file with an argument.
oursh /path/to/script.sh 123
# Equivalent to running `oursh` without arguments.
oursh -s
# Run the shell with a couple arguments.
oursh -s 123 456
# Run a command string with an argument.
oursh -c "echo $@" hello world
This shouldn’t be hard to write a grammar for…
Terminal Features
You nix
?
What about windows?