← blog

A Path Not Taken for OxCaml

OxCaml is Jane Street’s effort to oxidize OCaml, that is, to make it more Rust-like. In particular, the aspect of Rust focused on here is its “fearless concurrency” where the compiler can protect against data races, which is important in light of OCaml 5’s new multicore capabilities.1

The project adds a significant number of exciting new features to the language (modes, kinds, layouts, etc.), but is careful to act only as an “extension” to the language, promising on its front page that

all valid OCaml programs are also valid OxCaml programs

I’m going to argue that this promise, while noble, was ultimately misguided, forcing it to give up on what could have been a simpler language. To do that, we’ll first have to go over a little OxCaml, talk about how it’s complex, consider its use in practice, and then finally explore an alternate path. Hopefully by the end you’ll understand why, despite this promise, the Core validation library went from looking like this

type result
type 'a check = 'a -> result

val combine : result -> result -> result
val of_list : result list -> result
val name : string -> result -> result
val name_list : string -> result list -> result
val fail_fn : string -> _ check
val pass_bool : bool check
val pass_unit : unit check
val try_with : (unit -> unit) -> result
val first_failure : result -> result -> result

to instead like this (sans comments for brevity)

type result : value mod contended
type%template ('a : any) check = 'a -> result @ p [@@mode p = (portable, nonportable)]

val%template combine : result @ p -> result @ p -> result @ p [@@mode p = (portable, nonportable)]
val%template of_list : result list @ p -> result @ p [@@mode p = (portable, nonportable)]
val%template name : string -> result @ p -> result @ p [@@mode p = (portable, nonportable)]
val%template name_list : string -> result list @ p -> result @ p [@@mode p = (portable, nonportable)]
val%template fail_fn : string -> (_ check[@mode p]) @ portable [@@mode p = (portable, nonportable)]
val%template pass_bool : (bool check[@mode p]) [@@mode p = (portable, nonportable)]
val%template pass_unit : (unit check[@mode p]) [@@mode p = (portable, nonportable)]
val%template try_with : (unit -> unit) @ p -> result @ p [@@mode p = (portable, nonportable)]
val%template first_failure : result @ p -> result @ p -> result @ p [@@mode p = (portable, nonportable)]

Modes

So what’s that word “mode” we’re seeing above? OxCaml’s mode system is sort of like its type system, in that every value x : t @ m has some statically known type t and mode m, where @ is the syntax for mode annotation.

Modes are deep properties (so m applies recursively to the contents of x) and are formed as a lattice of different mode axes which are generally orthogonal to types and each other (although some values of some types can disregard or “cross” certain modes). To make this concrete, let’s look at some examples of how they might map onto Rust’s type system.

Modes & References

At Rust’s core is a borrow checker, which enforces each value x: T has a single owner, but allows it to be temporarily borrowed as &x: &T or &mut x: &mut T, with these borrows reflected in the type system. How might we express something similar in OxCaml?

Well, rather than being forced to change the type of our value to reflect these, we would put this additional information about how it can be used into its modes:

  • &x: &T might loosely correspond to x : t @ local read aliased
  • x: T might loosely correspond to x : t @ global read_write unique

Here, we see our first three mode axes:

  • Locality: local indicates a value which cannot escape its scope, whereas global values can do so (analogous to Rust’s lifetimes).
  • Visibility: read indicates a value which can read but not write to its mutable state, whereas read_write values can do either.2
  • Uniqueness: aliased represents a value which may have other references to it, whereas unique values act like Rust’s affine type system.

So far I’m relatively fond of these modes: they create clarity about what they each control rather than intermingling in a single concept of ownership, and don’t need to corrupt the type of x to represent this in the way that Rust’s version is also inextricably linked with whether we have a reference with indirection.3 However, you can probably see already the danger of a sort of combinatorial explosion in possible states as the number of axes that can be set independently increases.4 If each value can have a type, a locality, a visibility, and a uniqueness, that’s a lot more to hold in your head than usual.

Modes & Traits

OxCaml also has some modes that don’t map as directly onto Rust’s concepts of ownership, but instead look more like what Rust might put in a trait. In particular, I’d like to focus here on the portability mode:

  • Portability: portable indicates a value which can be shared with another thread5, whereas nonportable values cannot do so.

Portability can be thought of as analogous to Rust’s Send and Sync traits, which determine a type is safe to be shared across threads.6

Note here a crucial distinction. Our Rust concept was a property of types: if T: Send + Sync, then any value of type T could be shared between threads. Meanwhile, in OxCaml this is a property of values. It is possible to have values x1 : t @ portable and x2 : t @ nonportable with different portabilities despite having the same type.

This may be initially unintuitive to a Rust programmer, who is used to thinking about types as determining thread-safety. Canonically, atomically reference counted Arc<_> values could be shared, while non-atomic Rc<_> versions could not. However, the important thing to remember is that, in OCaml, functions are all values of the same arrow type:

val func : t1 -> t2

and the space of possible functions is very large, including both those which can clearly be shared across threads, as well as closures which capture mutable state, and therefore cannot be called from multiple threads without risking data races.

This means the arrow type cannot have one deterministic portability applied to all of its values.7 Furthermore, since any abstract type could potentially contain a function, this means just about every OCaml value must maintain some portability state (apart from those which are explicitly annotated as containing no functions and therefore “crossing” portability).

OxCaml divides its 9 currently documented modes into 4 “past” modes like the ones we saw earlier, and 5 “future” modes like portability, which are relevant only for types which contain functions for similar reasons to this.

Default Modes

Now, we’d said earlier that OxCaml promised backwards compatibility with any OCaml code. One immediate observation is that old OCaml code did not have mode annotations everywhere, and so OxCaml must be able to handle code which does not specify modes. To do so, it uses “default modes”.

Each mode axis has a default position, which is assumed when no mode is explicitly annotated. These defaults must be chosen carefully so that they do not break backwards compatibility:

  • Locality: In OCaml, we never had to worry about whether a value was allowed to escape its scope. Therefore, our default mode must be global.
  • Visibility: In OCaml, we never had to worry about whether we had permission to mutate a value of a type with mutability. Therefore, our default mode must be read_write.
  • Uniqueness: In OCaml, any value could be freely aliased, or have been so in the past. Therefore, our default mode must be aliased.

In general, so far we’ve set all of our modes to be maximally permissive, allowing you to do as you please with your values.8 Now let’s think about how we should extend this to the portability mode.

So, legacy OCaml code was always single-core9, and therefore there’s no legacy concept of whether values could be shared between them which we need to maintain backwards compatibility with. It would be nice if we could therefore continue to use the permissive modes, and by default allow values to be shared between threads.

However, this doesn’t mean we’re free to just set any default that we want, as we’d made another promise: that OxCaml would statically prevent data races. And as we discussed earlier, there exist some values which could cause data races if shared across threads. Therefore we must set

  • Portability: In OCaml, there exist functions that are not thread-safe. To be free from data races, we must conservatively assume all values are nonportable.

Unfortunately, this time we must choose the default mode that is more restrictive. Because the default mode applies to all legacy OCaml code, this means that all OCaml code, when used in OxCaml, must be presumed not to be thread-safe.10 That’s going to cause us some pain when writing code.

Portabilization

So what does it mean for us to say that all legacy OCaml code is going to be nonportable? Well definitionally, it means that those values cannot cross threads, but let’s think about the further implications.

One thing OxCaml likes to emphasize along with its backwards compatibility story is its “pay-as-you-go” complexity. That if you don’t need some particular new advanced feature, you can simply avoid it and continue to write plain old OCaml. Can that work out in practice?

Portability Interoperability

If we took that literally, we might want to write most of our code in OCaml, which OxCaml would interpret as nonportable. Then we would only adopt OxCaml’s concurrency features and their corresponding complications when we needed them for scaling performance-critical apps.

In this world, we might further hope that when we wrote a new app or moved some existing app to portable concurrent OxCaml, it could still take advantage of the larger OCaml ecosystem and preexisting libraries. Unfortunately, that’s going to prove difficult.

You see non-portability is infectious. We’d said earlier that modes are deep: applying recursively to all of a value’s contents. So a portable value cannot contain anything nonportable, and any function which calls a nonportable function must itself be nonportable.

Given that all values in the default mode are nonportable, asking you to write portable concurrent OxCaml that depends on nonportable legacy OCaml is sort of like asking you to write synchronous code that depends on a library in which every function is async. It just doesn’t work!

However, this does not mean that all hope of true backwards compatibility is lost. There remains a ray of hope in the opposite direction, if you want to write OCaml code depending on OxCaml. A nonportable function can call portable functions. This opens up a path for how our OCaml and OxCaml might coexist.11

Migration

If we want our portable OxCaml and nonportable OCaml to share dependencies, we’re going to need to start from the very bottom of the dependency stack, where no existing nonportable code is depended on. Converting that library to portable OxCaml is safe, because existing OCaml code can still use it, and now portable code can too.

Once we’ve done that, we’ve now opened up another few libraries which we might be able to migrate, because they now only depend on portable libraries, and so they themselves can become portable. Gradually, we can expand the scope of our migration out to the entire ecosystem in a process we’ll call portabilization.

As we’re performing this migration, we can effectively find ourselves in one of two states12:

  • Either our code is already thread-safe, frequently via use of immutable persistent data structures, and just needs to be annotated as such in its signature.
  • Or our code had the possibility of data races in the presence of multiple threads, and needs to be fundamentally rewritten or restructured to avoid this.

The former of course is simple, but the latter can create real work. Especially when porting the sort of performance-critical code we’d most like to move to OxCaml. That’s the same code most likely to use heavy mutation.13

However, we persevere because we can see the light at the end of the tunnel. If we can take all of our libraries and move them one by one from OCaml to OxCaml, then we will finally be able to inter-operate between the languages: choosing to write code in either portable or nonportable fashion and share the same dependencies. For an organization like Jane Street with a large quantity of existing OCaml code that cannot all be migrated at once, this is essential.14

Bifurcation

Still, does it feel like I pulled a bit of a sleight of hand on you there? I started this post talking about how the language was fully backwards compatible, and could compile all old code with no changes. Yet now I’m describing a world in which we must actively migrate all libraries from OCaml to OxCaml?

And even once we’ve done so, and all our libraries are in portable OxCaml, granting us the freedom to choose which way we want to write apps depending on them: which way do you think we’ll choose? The one where nobody else can depend on us and we might have to migrate in the future? No, we’ll probably stick with this new portable OxCaml.

In that case, incorporating OxCaml into an OCaml ecosystem risks feeling like migrating languages completely. You have to get used to new unfamiliar syntax for modes and layouts. You even have to substantially rewrite the logic of some of your code to fit the paradigms of the new language in which closures over mutable data are restrictive.

It’s not quite as bad as the Python 3 migration15, but the backwards compatibility promise wasn’t quite what we’re used to for regular language evolution where we can mostly avoid touching our old code. It’s like we’re migrating languages, but with a special property: that OCaml code can depend on OxCaml code, and the two can share the same toolchain.

To some extent that was unavoidable. There was genuinely a lot of OCaml code which was safe only because it was known that it could never cross threads. That couldn’t be magically transferred to a multicore language without modification. Still, if we acknowledge that kind of migration is all we can do, then perhaps we can widen the design space we consider.

Another Path

So what if we took that literally? If we decided that we’d never quite achieve true backwards compatibility between a language which required thread-safety and one that only ever used a single thread. If we concluded that the best we could hope for was the existence of a migration path from OCaml due to the ability to gradually swap out components and still use OCaml code?

This sort of change might feel like Rust’s concept of editions: where a project decides which edition of the language it will be written in, and the editions aren’t necessarily backwards compatible with each other. However, they still keep compatible interfaces, and even provide automatic migration tooling to convert from one to the other.16

How would we design our thread-safe successor edition? Would we choose to write OxCaml? I would argue not. As we saw in the setting of default modes, the constraint that this language be able to compile plain OCaml really placed a lot of restrictions on us. And those restrictions led to complexity: in the combinatorial explosion of modes as we’ve focused on here, but also in the ways we can use unboxed types, layouts, and other OxCaml features.

The space of possible languages is huge, and I won’t claim to come here with the perfect version. Still, let me talk through at least some of the ways we might improve upon the ailments we’ve discussed for a hypothetical better OxCaml.

Portability

In a language designed for multithreading, portable should be the default mode, and nonportable values should feel like an exception. This is especially true in a functional language where immutability is the default, but holds even outside that. Rust requires static mut to be unsafe, and this sort of thing should feel similarly discouraged in OCaml rather than appear as the default.

If you look through OxCaml branches of Jane Street’s core libraries, this is already effectively the case, with most files starting with @@ portable to indicate that the entire module’s contents should be taken as portable. Still, I don’t want to have to scroll up to the top to learn how to interpret a declaration. If this is what’s sensible, I’d argue it should apply in general, and nonportable values should have to explicitly inform us of their presence.

A migration tool here might mark every value as nonportable, or only those the compiler could not automatically infer to be portable. To legacy OCaml edition dependencies where modes are erased, this would of course have no impact whatsoever, as there would still be no concept of crossing threads.

Functions

Still, it is possible to imagine better: a world not just with a sensible default, but where we could get rid of the portability mode altogether. To do that, we would have to represent the difference between a thread-safe and non-thread-safe function in the type system, sort of like Rust’s Fn, FnMut, and FnOnce.

A function which is intrinsically nonportable is one which mutates values held in its closure, which could cause a data race. Another way to say that, is it is a function which, in order to be called, should have read_write17 access to the contents of its closure on the mutability mode. Ordinarily, our modes are deep and apply recursively to a value’s contents. A function’s closure is the one exception where we don’t track or enforce this.

You could imagine expressing that in the type of our functions. Where rather than having type (-in, +out) fn, OxCaml might have type (-in, +out, +mode) fn, where mode is the mode in which you must hold that function in order to be allowed to call it.18 Now data races on the contents of closures could be handled by the same mechanisms that stop data races elsewhere. Migrations could assume conservative requirements of their functions.

Suddenly, we could basically cut out our future modes. While this risks its own form of combinatorial explosion, it’s one much more limited in scope. These parameters apply only to functions, not to all values like our modes, and in most cases you probably don’t need to be generic over all of them, and might have some expectation about what manner of function should be allowed in a given use case.

Beyond

One could imagine even crazier proposals to eliminate further modes or simplify OxCaml’s other features. It could even adopt linear or affine types as the default rather than an as after-thought mode, since linear logic really does simplify many of our problems. I won’t try to iterate all of these possibilities.

It can sound here like perhaps I am saying that OCaml should cease to exist, and that if it is going to adopt some Rust-like features it should just go all the way. I’d like to make clear that I do not believe that. I find OCaml to be a particularly elegant language, and it is for exactly this reason that I am both eager to increase its capabilities and hesitant to add complexity.

Perhaps Jane Street could not have feasibly offered a proper edition without backwards compatibility when they don’t control the language and can’t always pull upstream with them. Still, if some of these changes make it back into upstream, that would offer an opportunity to consider such choices and add this power in the best possible way.

I believe OCaml should learn as much as it can from the experiences of other languages for how to offer safe concurrency. It is understandable that doing so may require new syntax, but when that is already the case, I don’t think making the syntax only nominally an exact superset of what exists is worth making it more complex than it needs to be.

Footnotes

  1. OxCaml contains other Rust-like features, such as its ability to handle types with multiple layouts beyond OCaml’s standard one-word option, but we won’t focus on those for this post.

  2. There also exists an immutable mode for values which can only access the immutable parts of their state, which has no real parallel in Rust where all fields are potentially mutable.

  3. Rust has had repeated proposals for &own references that can decouple responsibility for cleaning up memory from whether or not you have a pointer, among similar messiness.

  4. That explosion is even actualized by ppx_template we saw in our example, which really does expand out to all of the different mode combinations.

  5. Here the proper term would really be “domain”, which is OCaml’s version of a green-thread, but I’ll continue to use the term “thread” interchangeably in this piece, as does much OxCaml documentation.

  6. Send and Sync may feel distinct, but it is able to capture both simultaneously due to yet another mode tracking “contention” which roughly captures whether a value is exclusive to its thread.

  7. Whereas in Rust, each closure has its own anonymous type, so that not all closures need to share the same Send and Sync state.

  8. More formally, the mode axes include a notion of submoding, and we have always chosen the least mode which can freely move to any of the other modes.

  9. Here I’m considering “legacy OCaml” to be OCaml 4, with OxCaml as a potential upgrade from that. OCaml 5 does introduce multicore without these safety features, for which OxCaml does not make the same backwards compatibility promise.

  10. This is made even worse by OCaml’s use of explicit .mli interfaces separate from .ml implementations. This means that even when the compiler can infer some values are portable, that can’t be exposed without annotation.

  11. I realize there is an implicit assumption here that OxCaml code is necessarily portable. While not strictly required, I argue that if you were going to forgo multicore capabilities, you might as well write ordinary OCaml instead.

  12. Alright 3 states: it’s possible our values are portable but the compiler can’t prove it and so we’re sprinkling Basement.Stdlib_shim.Obj.magic_portable everywhere.

  13. It was in fact, a common practice in certain circles to pre-allocate a single global value and then mutate that to hold current data in non-recursive functions. This avoided allocations that might trigger a garbage collection cycle.

  14. Even their much smaller-scoped migrations need to be careful not to disrupt existing systems, e.g. https://signalsandthreads.com/swapping-the-engine-out-of-a-moving-race-car/.

  15. In which Guido Van Rossum famously made the opposite promise

    There is no requirement that Python 2.6 code will run unmodified on Python 3.0. Not even a subset. (Of course there will be a tiny subset, but it will be missing major functionality.)

  16. Admittedly Rust’s editions offer somewhat stronger guarantees in that they allow dependencies in either direction. This causes its own pain for things like Pin types.

  17. Strictly speaking it also needs to be uncontended on the contention mode, but the idea still applies.

  18. Although currying makes this somewhat more unwieldy, and you wouldn’t want to have to annotate every arrow of a t1 -> t2 -> ... -> t_n. So you’d probably have to keep the existing syntax to denote the case which does not make any requirements on the mode of its closure, which would be by far the most common case, and certainly apply to early curried arguments.

Comments

Loading comments…