Handling Rust Panics in WASM

By Raj Rajhans -

July 6th, 2023

3 minute read

The Problem

At work today, I encountered an interesting issue. For some background, we have some parts of our codebase written in Rust, which are used in both native and web environments. We use wasm-bindgen to interface with the Rust code from JavaScript in the browser via WebAssembly.

One of the cool things wasm-bindgen allows is calling async Rust functions from JavaScript. It does this by turning Rust’s Future into a JavaScript Promise. This is particularly useful when you need to perform asynchronous operations in Rust and want to consume the results in JavaScript. This is where our story begins.

My JavaScript code was interfacing with a Rust function through WASM, using wasm-bindgen. The Rust function, marked with the async keyword, implicitly returns a Future. This Future is then handled in JavaScript using the await syntax.

Here’s the javascript code in question:

setEngineState(EngineStatus.Starting);
 try {
    await engine.start();
    setEngineState(EngineStatus.Started);
  } catch (error) {
    console.error("Failed to start engine",  error);
    setEngineState(EngineStatus.Failed);
}

The problem? If there was a panic in the Rust code of the engine.start() function, I expected the catch block to get triggered so that we could set the correct state and show a failure UI. But, instead, it was infinitely loading. It seemed like the code was getting “stuck” on await engine.start().

After some investigation, I found the culprit: a limitation in the wasm-bindgen library.

The future_to_promise function in wasm-bindgen has a clear note in the documentation:

/// If the `future` provided panics then the returned `Promise` **will not
/// resolve**. Instead it will be a leaked promise. This is an unfortunate
/// limitation of wasm currently that's hoped to be fixed one day!

This limitation arises because in WASM, a Rust panic unwinds the stack and translates into an abort. When a Rust panic occurs, it leads to an unresolved JavaScript promise, also known as a ‘leaky’ promise. This can create a confusing scenario where it appears as though the WASM module is still operational, when in reality, it’s in an inconsistent state due to the panic.

This behavior can lead to some nasty bugs, as it did in my case. It can leave your system in an inconsistent state and create a lot of confusion while trying to debug the issue.

The Workaround

To solve this problem, I decided to use JavaScript’s Promise.race function. This function allows you to pass an array of promises, and it resolves or rejects as soon as one of the promises resolves or rejects.

Here is the updated code:

setEngineState(EngineStatus.Starting);
 try {
  await Promise.race([
      engine.start(),
      new Promise((_, reject) => {
        setTimeout(() => {
          reject("engine-start-timeout");
        }, 5000);
      })
    ]);
    setEngineState(EngineStatus.Started);
  } catch (error) {
    console.error("Failed to start engine",  error);
    setEngineState(EngineStatus.Failed);
}

In this code, if the engine.start() function does not resolve or reject within 5 seconds, the timeout promise rejects, triggering the catch block and allowing us to handle the error.

This workaround isn’t ideal, as it depends on a hardcoded timeout. But it does prevent the system from getting stuck in an inconsistent state and allows us to handle the error gracefully.

Other Potential Solutions

You might be thinking, “Why not simply catch the panic on the Rust side?” While this seems like a good solution, it’s unfortunately not possible when targeting WASM. The wasm32-unknown-unknown target is panic="abort" by default, and std::panic::catch_unwind has no effect in this case. This means that panics “abort” the process, which in this case is a native wasm trap.

It’s important to note that once an exception is caught on the JavaScript side, you should consider that specific WASM instance to be in an unsafe state. This is again due to the panic=abort behavior. When a panic occurs, no cleanup code gets executed, potentially leaving values in an inconsistent state. If you try to use these values again, it could result in memory unsafety. Therefore, the safest course of action after catching an exception is to discard the WASM instance and create a new one from scratch.

You can read more about this at this relevant discussion on rust-lang.org.

That’s it for this blog post ✌️

References

Refreshing Third Party Tokens before they expire using GenServers in Elixir

Be careful with System.system_time in Elixir!

Raj Rajhans

Product Engineer @ invideo