Multithreading in Node.js: Using Atomics for Safe Shared Memory Operations
Node.js developers got too comfortable with a single thread where JavaScript is executed. Even with the introduction of multiple threads via worker_threads
, you can feel pretty safe.
However, things change when you add shared resources to multiple threads. In fact, it is one of the most challenging topics in all software engineering. I'm talking about multithreading programming.
Thankfully, JavaScript provides a built-in abstraction to mitigate the problem of shared resources across multiple threads. This mechanism is called Atomics.
In this article, you'll learn what shared resources look like in Node.js and how Atomics
API helps us to prevent wild race conditions.
Shared memory between multiple threads
Let's start with understanding what transferable objects are.
Transferable objects are the objects that can be transferred from one execution context to another without holding to resources from the original context.
An execution context is a place where JavaScript code can be executed. To make it easier to understand, let's assume that an execution context is equal to a worker thread because each thread is indeed a separate execution context.
For example, ArrayBuffer
is a transferable object. It consists of 2 parts: raw allocated memory and JavaScript handle to this memory. You can read the article about Buffers in JavaScript to learn more about this topic.
Whenever we transfer ArrayBuffer
from the main thread to a worker thread, both components, the raw memory and JavaScript objects are recreated in the worker thread. There is no way you can access the same object reference or underlying memory of ArrayBuffer
inside of the worker thread.
The only way to share resources between different threads is to use SharedArrayBuffer
.
As the name suggests, it is designed to be shared. We consider this buffer to be a non-transferable object. If you try to pass SharedArrayBuffer
from the main thread to a worker thread, only the JavaScript object gets recreated, but the memory region that it refers to is the same
While SharedArrayBuffer
is a unique and powerful API it comes with a cost.
As Uncle Ben told us:
When we share resources between multiple threads, we expose ourselves to a whole new world of nasty race conditions.
Race conditions for shared resources
It would be easier to understand what I'm talking about with a particular example.
import { Worker, isMainThread } from 'node:worker_threads';
if (isMainThread) {
new Worker(import.meta.filename);
new Worker(import.meta.filename);
} else {
// worker code
}
We're using the same file to run the main thread and worker threads. The block under isMainThread
condition is executed only for the main thread. You might also notice import.meta.filename
, it is ES6 alternative to __filename
variable available since Node 20.11.0. Next, we introduce a shared resource and an operation over the shared resource.
import { Worker, isMainThread, workerData, threadId } from 'node:worker_threads';
if (isMainThread) {
const buffer = new SharedArrayBuffer(1);
new Worker(import.meta.filename, { workerData: buffer });
new Worker(import.meta.filename, { workerData: buffer });
} else {
const typedArray = new Int8Array(workerData);
typedArray[0] = threadId;
console.dir({ threadId, value: typedArray[0] });
}
We pass SharedArrayBuffer
to each of the workers as workerData
. Both workers change the first element of the buffer to its ID. Then, we log the first buffer element.
One of the workers will have ID equals to 1
and the other one to 2
. Without reading any further, what are you expecting to see in the output when this code runs?
Here is the result.
# 1 type of results
{ threadId: 1, value: 2 }
{ threadId: 2: value: 2 }
# 2 type of results
{ threadId: 1, value: 1 }
{ threadId: 2: value: 1 }
# 3 type of results
{ threadId: 1, value: 1 }
{ threadId: 2: value: 2 }
Did you notice it? Why on earth do we have cases where the value is the same for both threads? If you think about it from the standpoint of a single-threaded program, we should see different values printed every time.
Even if we run this code asynchronously in a single thread, the only thing that could be possibly different is the order in which a result is printed, but not such a drastic difference in the final value.
What happens here is one of the threads assigns value right between these two lines:
typedArray[0] = threadId;
// one of the threads sneaks right in here and assign value
console.dir({ threadId, value: typedArray[0] });
It goes like this:
The First thread assigns a value to the shared buffer
The second thread assigns a value to the shared buffer
The first thread prints the result to the console
The second thread prints the result to the console.
As you can see, it is easy to run into a race condition with as little as 10 lines of code when we have shared resources and multiple threads. That's why we need a mechanism that can make sure that one worker is not interrupting the workflow of another worker. The Atomics
API was created exactly for this purpose.
Atomics
I want to emphasize that using Atomics
is the only possible way to be 100% sure that you're not running into race conditions when dealing with multiple threads and shared resources between them.
The main purpose of Atomics
is to make sure that a single operation is performed as a single, uninterruptible unit. In other words, it ensures that no other workers can get in the middle of currently executable operation and do their stuff, like we've seen before.
Let's rewrite the example with race conditions using Atomics
.
import { Worker, isMainThread, workerData, threadId } from 'node:worker_threads';
if (isMainThread) {
const buffer = new SharedArrayBuffer(1);
new Worker(import.meta.filename, { workerData: buffer });
new Worker(import.meta.filename, { workerData: buffer });
} else {
const typedArray = new Int8Array(workerData);
const value = Atomics.store(typedArray, 0, threadId);
console.dir({ threadId, value });
}
We changed two things: how we save the value and how we read the saved value. Using Atomics
, we can do both operations at the same time using the store
function.
When you run this code, you won't see a case where both threads have the same value. They are always different.
[1, 1]
[2, 2]
[2, 2]
[1, 1]
We could use 2 operations instead of 1: store
and load
.
const typedArray = new Int8Array(workerData);
Atomics.store(typedArray, 0, threadId);
const value = Atomics.load(typedArray, 0);
console.dir({ threadId, value });
However, this approach is still prone to race conditions. The whole point of using Atomics
is to make our operations atomic.
In this case, we want 2 operations to be executed as a single atomic operation: to save a value and to read this value. When we use store
and load
functions, we're actually doing 2 separate atomics operations, not 1.
That's why it is still possible to run into a race condition where code from one worker gets in between store
and load
calls from the other threads.
There are more than just 2 functions to Atomics
, in the following article, we'll cover how to use more of its functions to build our own semaphore and mutex to make the work with shared resources even more convenient.
Conclusion
Node.js is all fun and good when there is only a single thread. If you introduce multiple threads and shared resources on top of it, you get an environment where race conditions are inevitable.
There is only one mechanism in JavaScript that allows you to mitigate these problems and avoid race conditions, it is called Atomics
.
The idea of Atomics
is to have operations that execute as a single unit that cannot be interrupted from the outside.
Thanks to such a design, we can be sure that whenever we use Atomics
functions, there is no way for other threads to get somewhere inside of such operations.