code-for-a-living August 24, 2020

Motoko, a programming language for building directly on the internet

To offer a seamless developer experience, we wanted to create a specialized programming language, called Motoko, that is designed to directly support the programming model of the Internet Computer, making it easier to efficiently build applications and take advantage of some of the more unusual features of this platform.

At Dfinity, we’re building the Internet Computer, a decentralized cloud computing platform that we conceive as a seamless software universe in which developers can deploy applications and services directly on the Internet. To realize this vision, we decided on WebAssembly as the lingua franca of the platform’s execution environment, so that developers can program it in any language that compiles to WebAssembly.

To offer a seamless developer experience, we also felt it was important to create a specialized programming language, called Motoko, that is designed to directly support the programming model of the Internet Computer, making it easier to efficiently build applications and take advantage of some of the more unusual features of this platform.

We’re excited to share with you a little about what makes this project so special. 

WebAssembly

First, we have to talk briefly about WebAssembly—a.k.a. Wasm (yes, correctly spelled without all caps). As you may be aware, Wasm is a newish low-level code format that aims to be portable, safe, and efficient. Its initial use case has been the web, but the name actually is a misnomer: when we designed Wasm in the W3C Working Group, we carefully did it as an open standard and a universal platform. That is, it is not aimed at any specific programming language, paradigm, computing environment, or platform, and we made sure that it is not at all tied to the web. So it is absolutely no accident that Wasm is seeing adoption in many other environments, such as cloud computing, edge computing, mobile, embedded systems, IoT, and blockchains.

There were many, many design considerations that went into Wasm, some obvious and some rather subtle. Too many to go into here. A fairly comprehensive discussion of Wasm’s technical goals, design choices, formal semantics, and implementation techniques can be found in a scientific article that we published in Communications of the ACM (an older and more technical version of this article is freely accessible here).

Wasm’s main difference compared to other virtual machines is that it is not optimized for any specific programming language but merely abstracts the underlying hardware, with a byte code directly corresponding to the instructions and memory model of modern CPUs. On top of that, Wasm supports sandboxing through strong modularity and a rigid mathematical specification that ensures that execution is safe, free of undefined behaviour, and (almost) entirely deterministic. Moreover, these properties actually have a machine-verified mathematical proof!

Altogether, these properties were intended to make Wasm an attractive choice for a wide range of environments and use cases that have high expectations for portability, safety, generality, and performance—such as Dfinity’s Internet Computer.

Creating the Internet Computer

The Internet Computer is a decentralized cloud computing platform that will host secure software and a new breed of open internet services. It uses a strong cryptographic consensus protocol to safely replicate computations over a peer-to-peer network of (potentially untrusted) compute nodes, possibly overlayed with many virtual subnetworks (sometimes called shards). Wasm’s advantageous properties made it an obvious choice for representing programs running on this platform. We also liked the idea of not limiting developers to just one dedicated platform language, but making it potentially open to “all of ’em.”

That’s the theory, anyway. In practice, porting an existing programming language to Wasm is not entirely trivial. Obviously, it requires implementing a new compiler backend. That’s fun, but the effort doesn’t end there: it also requires porting the language’s runtime system and library primitives. And there are still a few features, especially ones relevant to more high-level languages, that cannot currently be implemented in Wasm easily—for example: threads, coroutines, exceptions, and tail calls. While various proposals to enrich Wasm with respective functionality are on the horizon, they have not yet been finalized for standardization.

Although there are many experimental language implementations targeting Wasm already, most are not yet ready for prime time. The ones that are include low-level systems languages like C/C++ and Rust. These are certainly great for their use cases, but they are less-than-ideal tools for developing high-level applications for the Internet Computer, where accessibility, productivity, and high assurance tend to be more desirable than manual meddling with memory management.

On some platforms, including the Internet Computer, there are additional hurdles that need to be overcome to run Wasm, and they have to do with the limitations of the computing environments they provide. For example, Dfinity’s Internet Computer has very little similarity to a conventional operating system: there is no functionality like files, I/O, or other capabilities often taken for granted in language implementations and liberally used in runtimes or libraries. That means that porting an existing language is more than just a question of tweaking code: you may need to find new means to replace uses of missing platform functionality, remove them, or make different design choices altogether. Efforts like WASI try to address this problem to some extent, but are still in their infancy.

Unavoidably, these factors make a language port to Dfinity’s Internet Computer substantial work, even when adopting a language for which a generic Wasm port already exists.

At the same time, a language for the Internet Computer needs to provide access to the platform’s main concepts: a distributed programming model with asynchronous message passing, notions of resources like cycles (a.k.a. gas), and a few other idiosyncrasies. Sure, they could all be made available as libraries, but a language that natively includes appropriate constructs can deliver a much more seamless programming experience.

So if we have to do work anyway to get off the ground, why not apply ourselves to creating something that can deliver an optimal user experience and convey our vision for how to program the Internet Computer?

Motoko

That is why—despite all the risks of creating yet another language—we decided to create Motoko. We wanted a language that is safe, easy to use, and seamlessly exposes the concepts of the platform, as well as one that looks sufficiently friendly and accessible to most programmers. Currently, that latter goal makes it practically inevitable that it’s firmly in the semicolons-and-curly-braces camp of languages. And no suitable language existed in this camp.

But Motoko’s rather conventional skin is only superficial: its interior is that of a modern language. For example, every construct is an expression, it has closures, it has variant types and statically checked pattern matching, it has garbage collection, and of course it has a flexible type system that is actually sound, i.e., it really guarantees the absence of certain errors like crashes, undefined behaviour, misinterpreting data, or simply missing a case in a switch. No holes

At the same time, we intentionally tried not to be fancy or reinvent the wheel, but rather built on a wealth of history, both practical and theoretical, and acknowledged the lessons that have been learned over decades in this field. Besides putting together a coherent mix of well-understood features, Motoko’s design incorporates many small decisions to minimize foot guns and err on the side of safety, e.g., numbers cannot overflow by default, locals are immutable by default, concurrent execution is atomic by default, null cannot occur by default, fields are private by default, and so on. Oh, and there is no inheritance, only subtyping.

Implementing these parts of Motoko and compiling them to Wasm is conventional compiler craft. The Motoko compiler, written in OCaml, uses a typed intermediate representation, a few transformation passes, and spits out Wasm byte code. The generated Wasm module includes a small runtime system, written in C and Rust, that mainly implements a simple garbage collector using the Wasm memory as its heap. That wasn’t hard, but surely there is much potential for improvement here.

Actors

The central feature of Motoko, however, is its direct support for actors, in both syntax and type system. The actor model is a well-known concept that is 40+ years old, but sadly, it has barely made it into mainstream languages. An actor is like an object (and in Motoko, even looks like one), in that it encapsulates private state along with a set of methods to process messages that can be sent to it. But all message sends are asynchronous. Consequently, unlike conventional methods in OO, actor methods do not have results. Moreover, all messages are received sequentially by an actor—that is, it has an implicit message queue and methods execute atomically, even when messages are sent concurrently.

Actors are a great model for concurrent programming because they automatically prevent race conditions (thanks to atomicity and encapsulated state) and deadlocks (because execution never blocks), and hence rule out many concurrency bugs. All that without requiring programmers to ever define a lock. Actors are also a great model for distributed programming, because asynchrony naturally deals with the latency involved with sending a message to a potentially remote receiver. And finally, actors are a great fit for Dfinity’s Internet Computer, where applications are deployed in the form of so-called canisters—essentially, actors represented as Wasm modules that can communicate across subnetworks. It turns out that Wasm’s module concept is a nice fit for this because we can directly interpret module exports as actor methods. So a Motoko actor compiles to a Wasm module, where the methods become exported Wasm functions with special parameter conventions defined by the platform.

In short, an application in Motoko is an actor (or several), which in turn is a big asynchronous object compiled into a Wasm module. With Wasm’s notion of memory, such an actor can immediately manage up to 4 GiB of internal state, although this can be enlarged further by linking multiple Wasm modules that each have their own memory. We are curious to see how quickly the first users will run into this memory limit.

Futures

To make asynchronous programming more convenient and allow expressing it in sequential “direct style,” Motoko adopts another 40+-year-old idea from the annals of programming language research, though one that fortunately became a bit more popular recently: futures (also called promises in some communities). In Motoko, they materialize in the form of “async values,” values of type `async<T>` that are produced by expressions prefixed with the `async` keyword. In particular, a function body can be an async expression, thereby naturally replacing the more monolithic concept of an “async function” that exists in some other languages.

With that, actor methods are allowed to have results after all—as long as those are futures. Futures can be awaited to get their value, but only inside another async expression, akin to async/await monads as known from other modern languages.

The Motoko compiler implements this via a traditional CPS (continuation passing style) transformation, turning each await point into a separate Wasm function (plus some closure environment) representing the continuation of the await. In fact, it’s double-barrelled CPS, because every message can also have a failure reply with a respective failure continuation. By convention, a method with an async result is one that sends a reply message carrying the result values as arguments. This message is received by the created continuation function, which can then resume the execution it has captured. Waiting for a reply doesn’t block an actor—it can freely receive other messages in the meantime.

Persistence

Another important consideration for Motoko was allowing developers to utilize blockchain technology without having to learn an entirely new type of computing. So we took out all of the special knowledge that you might need on the current breed of blockchain programming languages. For example, there is no observable notion of block or block height, no explicit constructs for updating state on the blockchain, nor is there other API for writing data to persistent storage, like files or databases (although that could be emulated as a library). Instead, the Internet Computer implements orthogonal persistence—yet another old idea where a program has the illusion of running “forever” and its memory staying alive (at least until it is explicitly taken down). In Motoko, this means that developers do not have to worry about explicitly saving their data or bother with files or an external database: whatever values or data structures are stored in program variables will still be there when the next message arrives, even if that is months later.

The platform takes care of transparently saving and restoring the private state of a canister between method invocations. That was relatively easy to retrofit onto a Wasm engine, because the state of a Wasm module is clearly isolated in a module’s memory, globals, and tables. For the most part, it is sufficient to watch Wasm memories with the use of virtual memory techniques exposed by operating systems. This way, the platform knows when pages in such a memory have been modified and can take whatever measures are necessary to persist the dirty pages, as well as hashing them for the distributed consensus protocol.

Beyond Motoko: Interface definitions

Because the Internet Computer runs Wasm, Motoko is just one option for creating an application—and intentionally so. We are looking forward to making other language choices available. Even then, because each language will uniformly compile to canisters represented in Wasm, these canisters can freely communicate with each other through message sends regardless of their source language.

To make such interoperability well-defined, we have also introduced a generic interface definition language (IDL) named Candid that is independent from Motoko. It describes the set of messages understood by a canister and what type of data is sent along. Data is described in Candid by a combination of canonical data types (numbers, text, arrays, records, variants, functions, references to other canisters) that are independent from the Motoko type system or that of any other programming language.

Phew, yet another type system? Well, programmers will probably be pleased that the Motoko compiler can automatically consume and produce such interface descriptions for actor exports and imports and map them from and to corresponding Motoko types. It also automatically generates the right Wasm code to serialize and deserialize the argument data for each message, transparently interconverting Motoko’s internal representation with the binary format that Candid specifies.

This way, Motoko programs can communicate with external canisters in a typeful manner and express remote invocations as if they were local objects in the program. And that is regardless of whether the remote canisters are written in Motoko or, say, Rust; the interface description of a canister is enough as type information. Besides mere convenience, interfaces also provide a strong form of modularity, where programs can be type-checked against other actors/canisters without having access to their implementation.

Conclusion

Our goal is that the Internet Computer will become a multi-linguistic platform where all languages have equal rights, can interact seamlessly across canister boundaries, and Motoko is just one choice among many. This is important to make the Internet Computer platform an open one.

Wasm has so far proved to be a versatile code format to achieve this goal. We especially benefit from its simplicity, modularity, and safe and deterministic semantics. But despite these nice properties, porting compilers and libraries, let alone applications between different Wasm ecosystems, is not as straightforward as one might hope, since it involves so much more than just bare code. But Wasm is still young, and certain barriers are to be expected.

The biggest upcoming Wasm feature we are eagerly awaiting is the advent of first-class reference types and function references. That will make for a much cleaner system API, through which Wasm modules (and hence Motoko programs) talk to the Internet Computer platform). Interested programmers can find more details about the SDK here and contribute to the Motoko base library via GitHub here.

Andreas Rossberg is a researcher and engineer at DFINITY who leads the development of Motoko, a new programming language for DFINITY’s Internet Computer. Before joining DFINITY, he was an engineer at Google working on the V8 JavaScript engine that runs in Chrome, and also worked as an academic researcher in programming language theory, design, and implementation at the Max Planck Institute for Software Systems. He is one of the co-designers of WebAssembly and authored its specification. Below, he discusses DFINITY’s use of WebAssembly for the Internet Computer and the experience of designing Motoko.

Tags: , , ,
Podcast logo The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

Related

newsletter August 28, 2020

The Overflow #36: Community-a-thon

August 2020 Welcome to ISSUE #36 of the Overflow! This newsletter is by developers, for developers, written and curated by the Stack Overflow team and Cassidy Williams at Netlify. In the US, it’s the dog days of summer, which we imagine dogs are excited about. Read on for personal development nerds, creating secure voting systems…