Understanding Node.js Buffer

Understanding Node.js Buffer

So far, we've become familiar with buffers, typed arrays, data views, and how they all work together. If you missed the previous articles, I highly recommend reading the one dedicated to buffers and the other one on views.

Node.js provides a dedicated buffer abstraction called Buffer. Why do we need more buffers when we already have ArrayBuffer and the different views that come with it?

In this article, we'll answer this question and understand the difference between Node.js buffer and all the others. In the end, you'll learn what problems the Node.js buffer has and why some people avoid using it at all costs.

Difference between Node.js buffer and typed arrays

In short, the Node.js buffer is basically Uint8Array spiced up with some extra logic. It automatically means two things:

  1. Node.js buffer is not "actually" a buffer but a view into an underlying buffer.

  2. Wherever you use Uint8Array, you can also use Node.js buffer, with minor exceptions that we'll discuss later in this article.

One of the core abstractions responsible for buffers in Node.js is called FastBuffer. It is a class that extends Uint8Array. You can see it for yourself.

class FastBuffer extends Uint8Array {
  // Using an explicit constructor here is necessary to avoid relying on
  // `Array.prototype[Symbol.iterator]`, which can be mutated by users.
  // eslint-disable-next-line no-useless-constructor
  constructor(bufferOrLength, byteOffset, length) {
    super(bufferOrLength, byteOffset, length);
  }
}

Functions that create Node.js buffer such as Buffer.from always return the FastBuffer. But what is the point of having such a dumb class without any logic? Moreover, whenever we call Buffer.from it actually returns Buffer, not FastBuffer.

import { Buffer } from 'node:buffer';

const buffer = Buffer.from('hello');
console.log (buffer); // Prints <Buffer 68 65 6c 6c 6f>

Am I misleading you? Not really. The reason you see Buffer instead of FastBuffer in the console and why FastBuffer doesn't contain any additional logic itself because of the prototype manipulations.

FastBuffer.prototype.constructor = Buffer;
Buffer.prototype = FastBuffer.prototype;
addBufferPrototypeMethods(Buffer.prototype);

What is the point of such a manipulation? One of the reasons is backward compatibility. Many Node.js APIs have been using Buffer for a long time, and changing code to FastBuffer might result in big problems. That's why it is easier just to swap prototypes and keep the same interfaces for the existing code.

The Buffer prototype is where all methods like from, alloc, and others reside.

Node.js buffer memory allocation

I want to bring your attention to how Node.js allocates the memory for its buffer.

That's one of the most important differences between Buffer and Uint8Array, which you should be aware of.

The first thing you have to understand is that typed arrays have separate memory pools.

const array1 = new Uint8Array('hello');
const array2 = new Uint8Array('world');

console.log(array1.byteOffset); // Prints 0
console.log(array2.byteOffset); // Prints 0

By memory pool, I mean the buffer where data is stored. Every time you create a new typed array, you create a dedicated instance of ArrayBuffer with it. At the same time, Node.js buffers share the same memory pool.

import { Buffer } from 'node:buffer';

const buffer1 = Buffer.from('hello');
const buffer2 = Buffer.from('world');

console.log(buffer1.byteOffset); // Prints 16
console.log(buffer2.byteOffset); // Prints 24

If you save anything less than 8 kilobytes in a buffer, then you end up with multiple data chunks in the same shared memory pool.

You see 16 in the console because there was already some pre-allocated data inside of a buffer pool. The bytes offset of the second buffer directly depends on how much space the first buffer takes.

import { Buffer } from 'node:buffer';

const buffer1 = Buffer.from('controversial');
const buffer2 = Buffer.from('opinion');

console.log(buffer1.byteOffset); // Prints 16
console.log(buffer2.byteOffset); // Prints 32

The word hello is 5 characters long, and the word controversial is 13 characters long. The difference between them is 8 characters, and it is the exact difference in the offset of a second buffer between two examples.

I must say that there is a safer way to create a buffer. You can do so by using alloc function. However, it doesn't change the fact that the buffer has a shared memory pool.

Node.js Buffer problems

Now, that you understand how Node.js Buffer looks like it is time to discuss some of its problems. It's been an ongoing discussion since typed arrays became available. The main question is, "Why do we need to have the Buffer at all?"

It is not compatible with other platforms

Imagine you developing a library that you want to be available anywhere: browsers, Node.js, or Deno. If you try to write code using Buffer it will break in any other non-Node.js platform. The reason is simple, it is Node.js specific API.

Instead, you can use typed arrays like Uint8Array or Int8Array. They are part of ECMAScript standard, and all big platforms that are running JavaScript support them.

It violates Liskov substitution principle

Liskov substitution principle (LSP) is one of the SOLID principles that state that a subclass should be able to substitute for its parent class without causing unexpected behaviors.

While Node.js Buffer extends Uint8Array, it overrides slice method, which leads to violation of LSP because they behave differently.

When you use slice with typed arrays like Uint8Array instances, a new typed array with a new underlying buffer is created.

const array1 = new Uint8Array([1, 2, 3]);
const array2 = array1.slice(0, 2);

array2[0] = 4;

// Prints Uint8Array {0: 4, 1: 2}
console.log(array2);

// Prints Uint8Array {0: 1, 1: 2, 2: 3}
console.log(array1);

After we mutate the first element of array2, it only affects the underlying buffer of array2. We don't see any changes to the first element in array1. slice method of Node.js Buffer is different. Instead of creating a copy, it creates a view into the same buffer.

import { Buffer } from 'node:buffer';

const buffer1 = Buffer.from([1, 2, 3]);
const buffer2 = buffer1.slice(0, 2);

buffer2[0] = 4;

// Prints <Buffer 04 02>
console.log(buffer2);

// Prints <Buffer 04 02 03>
console.log(buffer1);

You can see that buffer2 has 4 as its first element, which is expected because we directly mutated this buffer. What is not expected is that the first element of buffer1 has also changed to 4. slice method is officially marked as deprecated. However, as long as it is available as Buffer method it violates LSP.

Buffer has more security implications

The last but not least part is Buffer security. As we've mentioned before, Node.js buffer has a shared memory pool.

If we have some data stored in this shared memory pool, any code that runs in your application can access this memory. Imagine a scenario where you store sensitive data in the buffer, like a person's address, and I, as a bad actor, can access this data.

import { Buffer } from 'node:buffer';

const addressBuffer = Buffer.from('Person personal addres');

Here we have a buffer with personal information that we don't want to leak anywhere. We can gain access to it through the shared memory pool.

const hackyBuffer = Buffer.from('1');

// Prints the full underlying buffer
console.log(hackyBuffer.buffer);

Assuming we don't have access to the initial buffer object itself, we create hackyBuffer. The underlying buffer of hackyBuffer contains the persons' private data. The only thing left is to interpret the hexadecimal data stored in the buffer into a human-readable format.

const hackyBuffer = Buffer.from('1');

const typedArray = new Uint8Array(hackyBuffer.buffer)
  .filter(value => value !== 0);
const charCodes = Array.from(typedArray);

// Prints "//Person personal addres"
console.log(String.fromCodePoint(...charCodes));

We created a view into the underlying buffer to be able to work with its content. Then, we filtered all empty values and created an array of character codes. At this point, it was automatically converted from a hexadecimal to a decimal numeric system.

The only thing left is to convert those numbers into a string. Since all of these numbers are basically a number representation of Unicode characters, we can use fromCodePoint to convert them back to characters.

If you're not comfortable with all these manipulations with character and numeric systems, I highly recommend reading the article about different encoding schemes in JavaScript.

And that's how you "hack" Node.js buffer.

Why do we need to use Node.js buffer at all

After seeing all of these problems, it is reasonable to ask the purpose of using of Node.js buffer.

It is a valid question, and it has been raised in Node.js community before. We can notice that the question of giving more preference to Uint8Array instead of Buffer having more and more attention which is a good sign. Despite all the problems, Buffer has useful functions that haven't been shipped with typed arrays yet. For example, we can convert buffer to hex or base64 string.

import { Buffer } from 'node:buffer';

const buffer = Buffer.from('hello');

// Prints 68656c6c6f
console.log(buffer.toString('hex'));

There are libraries that make it work with Uint8Array, but it is one more dependency in your project that is not always worth it.

Conclusion

Buffer is Node.js abstraction that enables you to create and manage buffers through a common interface. The Buffer extends Uint8Array and essentially not a buffer, but view into the underlying ArrayBuffer.

While Buffer extends Uint8Array, it provides extra methods on top of the typed array, which makes it unique.

Unlike regular typed arrays, Node.js buffer has a shared memory pool. It means that if you create two small buffers, they share the same memory space. This design with a shared memory pool leads to potential vulnerabilities because it is possible to get data from one buffer through a completely unrelated buffer.

Other problems of Buffer include limited distribution across other platforms because it is Node.js-specific API, and violation of LSP principle from SOLID where superclass method behavior and subclass method behavior are different.

Despite all of these problems, the Node.js Buffer still might be useful because it provides the functionality that typed arrays simply don't have.