Post-Penultimate Conditional Syntax
In the control flow of many programming languages, there exists a sort of divide between boolean if/else conditionals and pattern matching statements1 which may bind values for future reference. In choosing syntax for my own language, I wanted to see if I could unify these two. Here lie my attempts at creating an ideal conditional syntax.
Prior Art
Existing languages have often noticed this distinction and made some effort to bridge the gap and allow one construction to better cover scenarios that might ordinarily require the other. Let’s start with my two primary inspirations: OCaml and Rust.
OCaml: Putting if in match
As an ML-family language, OCaml tends to treat match as the primary conditional.2 To permit that, it provide match some of the flexibility of if by introduction of the when clause, which adds a conditional that must be met in order for a branch to be taken. This lets us write something like
let rec has_repeated_pair list =
match list with
| [] -> false
| x :: y :: rest when x = y -> true
| x :: rest -> has_repeated_pair rest
where we have a single expressive top-level conditional which lists out the possible cases. Meanwhile, if we had to use purely ordinary if and match we might be forced to instead write
let rec has_repeated_pair list =
match list with
| [] | [ _ ] -> false
| x :: y :: rest ->
if x = y then
true
else
has_repeated_pair (y :: rest)
having to increase our indentation and nest the logic a bit. This isn’t so painful for such a small function, but you can imagine that the complexity felt compounds in real logic. These are the sorts of inconveniences we’ll try to avoid today.
However, this syntax has clear limitations. First, one can do no additional binding or pattern matching after the when, which means that we might still be required to nest for something as simple as this:
let pipelined x =
match f1 x with
| Error e1 ->
...
| Ok y ->
match f2 y with
| Error e2 ->
...
| Ok z ->
...
or in any equivalent case where further computation requires new bindings.3 Second, and more dangerously, these when clauses actually make the language unsound, such that this code will compile without any warnings, but then crash on ref false:
let impure_guard (x : bool ref) =
match x with
| { contents = true } -> (* compiler believes this covers the true case *) ()
| _ when (x := true; false) -> (* mutates x *) ()
| { contents = false } -> (* compiler believes this covers the false case *) ()
;;
A key feature of match statements is their ability to check exhaustiveness, ensuring that they cover all possible cases. Doing so is complicated when there might be arbitrary mutating functions in the middle. Clearly this is not yet a perfect solution, so let’s see how Rust, as a newer language, approached it with the benefit of hindsight.
Rust: Putting match in if
Rust is somewhat more imperative and systems-focused, so it approaches from the opposite direction, treating if/else as primary, and giving them some of the power of match by way of the if let statement. This allows let to be used not just as a way to deterministically bind to expressions, but also act as a conditional for whether a pattern can be bound:
fn if_let() {
let x = deterministic();
if let Some(y) = conditional(x) {
...
}
}
This adds some complication to the language. Now, depending on the surrounding context, let may or may not be an expression of type bool. It also reads backwards relative to the dataflow, where you first compute a value and then inspect it. Nonetheless, it does neatly resolve the sort of pipelining we discussed earlier without nesting:
fn pipelined_1(x: T) {
if let Ok(y) = f1(x)
&& let Ok(z) = f2(y) {
...
} else {
...
}
}
At least, it does so long as we are alright treating errors in f1 and f2 as equivalent, which may not always hold. We’ve been able to clean up the happy path, but haven’t really represented all branches well in our conditional. To achieve that, we might try yet another Rust feature, the let ... else:
fn pipelined_2(x: T) {
let Ok(y) = f1(x) else {
// must diverge
}
let Ok(z) = f2(y) else {
// must diverge
}
...
}
These represent a conditional where the happy path can be raised to the top level because the other branches must return early or otherwise can never resume the rest of the function. They let us handle the two kinds of errors differently, but still don’t actually let us inspect e1 or e2 in our failure case. We haven’t really accessed the full match-like power to destructure the Result<T, E> type and say “f1(x) is either Ok(_) or Err(_)”, let alone handle cases with more possible branches or without a single happy path.
Rust has identified some of the most common patterns and created syntactic sugar that works well for those specialized cases. However, with so many different syntactic structures punning the same keywords and not yet covering the general case, we haven’t yet found my desired simplicity. We’ll have to venture further to the world of niche languages.
Ultimate Conditional Syntax
So I wandered the blogosphere and came upon the Ultimate Conditional Syntax (UCS), which appeared to be about the state of the art in unifying these concerns into a simple syntax. It introduces two keywords to a toy language: is and and. In the spirit of these coming from a proper paper, let me try and give them proper definitions.
expression is pattern is a boolean expression. More specifically, it is the expression that evaluates to true if and only if pattern binds to expression. This means that in contexts where we learn it is true, we can use those bindings in our environment:4
if x is Some(y) {
do_something x
}
We gain extra bindings inside the if, analogous to how we might gain some type unification when matching on a GADT. This functions in an equivalent manner to Rust’s if let ..., but I appreciate having a new keyword to distinguish it. and, meanwhile, is perhaps best understood by example.
if x is Some(y) and y is Some(z) {
...
}
can be viewed as syntactic sugar for the following:
if x is Some(y) {
if y is Some(z) {
expression3
}
}
This is, of course, how boolean and or && operates under the hood in many languages that permit short-circuiting: skipping evaluation of the right hand side if the left turns out to be false. However, it is worth emphasizing that this is truly a kind of control flow that deserves a keyword, and nothing like other infix operators that evaluate both their operands before applying some function.5 Additionally, this means that it has special interactions with is expressions.
In order for us to even begin evaluating the right side, it must be the case that the left side was true. Meaning that we can use any of its bindings in evaluating the right side. Likewise, we can use both of their bindings inside the if, which makes sense since and gives the expression that is true if and only if both of its operands are true, and here our operands use is which binds when that occurs.6
To me, this felt like a pretty strong start towards the goal of giving if/else the power of match, tightening up Rust’s let-chains. Sure, it was still missing match’s exhaustiveness check and ability to combine branches, but we could build up to those. That made it all the more disappointing to me when the paper re-introduced a match-like construct for these:
if x is {
Ok(ok) => ...
Err(err) => ...
}
Sure, they reused the keywords if and is to build it, but what I was searching for was not just keyword punning: I wanted to see if I could merge the language constructs properly. As it is here, we’ve introduced confusion about whether is should be followed by a pattern to conditionally bind to and evaluate to a boolean, or a list of cases to exhaustively enter one of, and the remainder of the paper is forced to tackle the complexity of disambiguating during compilation.
Post-Penultimate Conditional Syntax
I’d like to resume where UCS left off, which I suppose forces me to adopt a ridiculous name like Post-Penultimate Conditional Syntax.
OR Patterns
One simple thing I’d like to be able to recreate from match statements is the ability to merge branches that would create the same bindings:
match list with
| [ last ] | [ _ ; last ] | [ _ ; _ ; last ] -> f last
| _ :: _ :: _ :: _ :: _ -> failwith "more than 3 elements"
It’d be substantially more verbose if each |-separated pattern needed its own case:
if list is [ last ] {
f last
} else if list is [ _, last ] {
f last
} else if list is [ _, _, last ] {
f last
} else {
panic!("more than 3 elements")
}
To do the same in our world of boolean logic, we’d have to introduce an or operator7, which we could think of as syntactic sugar for the former. After all, a short-circuiting or would execute the second operand only if the first was false, just like an else:
if list is [ last ] or list is [ _, last ] or list is [ _, _, last ] {
f last
} else {
panic!("more than 3 elements")
}
Where and was the operator that, when true, gave the union of all bindings from its operands (since they must both be true), or would be the operator that, when true, gives the intersection of its operands’ bindings8, since at least one must be true, but we can’t assume that it’s any specific one.
Exhaustiveness
Now perhaps more intimidating, the ability of the compiler to check exhaustiveness of some conditional, ensuring that it will always enter some branch, is very important to match. In particular, it is what allows it to be an expression with a non-unit type, otherwise there’d always be a risk we didn’t enter any branch and didn’t know what value to use.
To rephrase that a little bit, we could say the thing an exhaustiveness checker needs to prove is that “conditional on not having entered any of the prior branches, we are guaranteed to enter the last branch”. And as it turns out, in curly-brace syntax there’s some unused space in the if/else construct where we could put just that information:
if x is Ok(ok) {
...
} else x is Err(err) {
}
Here, when an expression e comes after else, the exhaustiveness checker must be able to prove that, if we were to reach the else, then e must evaluate to true. This is identical to what we’d have to prove in a match, doing casework on the scrutinee(s), which are inferred from the expression behind the else rather than stated up-front. Now we can even combine to write things like
if query_result is Ok(info) or (retry_allowed and do_retry() is Ok(info)) {
...
} else query_result is Err(err) {
...
}
Putting It All Together
So we’ve now got
- A judgement we can make about expressions, in which we say they produce some bindings when they are evaluated to
true. - The
iskeyword, which serves as the basis for expressions which we can make such judgements about. - The
andandorkeywords, which compose these using union and intersection respectively. - The
ifandelsekeywords, which allow us to unpack and use those bindings and check exhaustiveness.
I’m pretty happy with unification and the way it recreates seemingly particular features in a single coherent system by grounding itself in an algebra. One could even imagine introducing a not keyword, which inverts whether expressions form bindings on true or false. Combined with analysis that a block must exit early (return, break, etc.) this would recover Rust’s let ... else guards, but feel like a natural extension rather than additional distinct syntax for newcomers to learn. Before we get too excited though, let’s think about a couple of concerns.
Concerns
Unsoundness
To start with something familiar, earlier I brought up how OCaml matches could be made unsound by using when clauses with functions that had the side effect of mutating the values being matched on. One could imagine a system like this in which exhaustivity checks are less clearly contained to lead to more common issues of this kind.
Indeed, I would not recommend this syntax to OCaml, which has historically been able to permit mutation without constraint due to its single-threaded runtime.9 Luckily, however, Rust-like languages already solve this by requiring that functions annotate whether they mutate their arguments. Any mutation then invalidates the compiler’s notion that it has already covered some cases, preventing the unsoundness.
Ergonomics
Probably the bigger thing you’ve been noticing though, is how this system can sometimes be more verbose than the kind of match statements you’re used to. It forces you to name intermediaries and then repeat their name in each is:
let result =
big_computation(
many,
long,
arguments);
if result is Ok(_) {
...
} else result is Err(_) {
...
}
whereas a standard match might have allowed you to write just
match
big_computation
many
long
arguments
with
| Ok _ ->
...
| Error _ ->
....
Indeed, for the case of a single computation which acts as the scrutinee in a conditional with many cases, we cannot be more concise than syntax specifically designed for that scenario. That is even a sufficiently common situation to perhaps warrant the retention of a match or similar statement. However, that is not all cases, and our familiarity with it as the standard may force the paradigm into cases where it doesn’t make sense. E.g., if we wrote in this new syntax
if result1 is Ok(val) or result2 is Ok(val) {
...
} else result1 is Err(e1) and result2 is Err(e2) {
...
}
to fit it into a match we might have previously written
match result1, result2 with
| Ok val, _ | _, Ok val ->
...
| Error e1, Error e2
...
This encodes the same idea, and may even be more ergonomic, but it suggests the wrong semantics. In OCaml, the creation of a tuple to use as a scrutinee should theoretically incur a heap allocation, as all values bigger than one word are boxed. Meanwhile in Rust, the creation of the tuple (assuming it is owned) should theoretically incur a move of the results into a new segment of stack memory. Neither of these costs should be necessary when we can just inspect the results directly, and the clever compilers can probably optimize both of them away, but isn’t it better for clarity if our syntax to represents the logic we expect directly?
I would even argue that the discipline of naming intermediaries can make the code easier to read, as you never need to scroll up from a case to see what was initially matched on. So I’m going to continue experimenting with unifying conditionals entirely and leaving match out.
Footnotes
-
Or
switchfor those coming from algebraically deficient languages ↩ -
Some, including me when writing OCaml, even believe in matching on
bools to eschew any use ofif/elsein the language. ↩ -
Of course in this particular monadic case we just match on
Or_error.(f1 x >>= f2), but that might not be friendlier to newcomers and doesn’t always apply. ↩ -
I’m using Rust-like syntax here in place of the ML-like syntax in the paper, but otherwise retaining the constructs. ↩
-
At least outside of lazy languages like Haskell where short-circuiting can be replicated within normal operators. ↩
-
If I were a proper theorist here’s where I’d start defining judgements for this notion of bindings conditional upon being true, but I don’t want to scare anyone away with symbols. ↩
-
I do not mean to claim this as entirely novel, and I’ve seen discussions of logical or e.g. in Rust if not in UCS. ↩
-
Conditional on unifying the types of each pair of intersected bindings.
andlikewise has complications when intersections exist between its bindings. ↩ -
This has changed in OCaml 5, but the safety story isn’t as developed as Rust, leading to offshoots like OxCaml. ↩