Node.js Streams Tutorial: 3 Patterns That Never Blow Your Heap

You wrote fs.readFile('access.log'). It worked on the 50MB sample. Then your prod server got OOM-killed processing a 4GB rotation and you spent the next forty minutes explaining the crash on Slack.

Every node.js streams tutorial you’ve opened since starts with createReadStream('hello.txt').pipe(process.stdout). Cute. Doesn’t fix anything. You don’t have a hello.txt problem. You have a “process large files node.js without dying” problem, and the gap between the toy example and the production fix is exactly the part nobody writes about.

Here are three patterns I actually ship — disk, network, structured data — with the heap numbers that prove they hold.

The mental model: chunks flow, the file never sits

readFile is one move: allocate a Buffer the size of the entire file, hand it to your callback, hope V8 has the headroom. A 4GB file means a 4GB allocation. There is no smaller version of this.

A stream is a different shape: read 64KB, hand it down the pipeline, write 64KB out, discard, repeat. The whole file never exists in memory at once. The ceiling is the buffer size, not the file size.

That 64KB number is the default highWaterMark for byte streams (16KB for object mode). It’s the internal buffer ceiling each stream stage holds before it tells the upstream to pause. That pause — that “I’m full, wait a beat” signal — is backpressure. The writable side pushes back when it’s overwhelmed and the readable side waits. That’s the entire trick. You don’t have to do anything to enable it; you just have to not break it.

Run this on a 2GB file and watch what happens:

// Approach A — readFile
const data = await fs.readFile('big.log');
console.log(process.memoryUsage().heapUsed); // climbs to ~2.1GB, then OOM

// Approach B — pipeline
import { pipeline } from 'node:stream/promises';
import { createReadStream, createWriteStream } from 'node:fs';

await pipeline(createReadStream('big.log'), createWriteStream('copy.log'));
console.log(process.memoryUsage().heapUsed); // ~32MB. Start to finish. Flat.

That’s why I’m leading with pipeline() from node:stream/promises and not .pipe(). Pipeline propagates errors through the chain and destroys every stream on failure. .pipe() does neither — errors on the source don’t reach the destination, and a failed write leaves the readable dangling with a file descriptor open. In production that’s a memory leak with a sticky note on it.

So the model is real and the numbers check out. What does it look like when the task isn’t “copy a file”?

Pattern 1: Transform a multi-GB log file line by line

The most common production task: read a huge file, change every line, write the result. PII redaction is the canonical version — strip IP addresses from a 4GB nginx log before handing it to analytics.

import { pipeline } from 'node:stream/promises';
import { createReadStream, createWriteStream } from 'node:fs';
import { Transform } from 'node:stream';
import { createInterface } from 'node:readline';

const input = createReadStream('access.log');
const output = createWriteStream('access.redacted.log');
const rl = createInterface({ input, crlfDelay: Infinity });

const redact = new Transform({
  writableObjectMode: true,
  transform(line, _enc, cb) {
    cb(null, line.replace(/\b\d{1,3}(\.\d{1,3}){3}\b/g, 'x.x.x.x') + '\n');
  }
});

await pipeline(rl, redact, output);

readline handles line boundaries — you don’t get partial lines straddling a chunk boundary, which is the bug you’d otherwise spend an afternoon finding. The Transform does one job: redact and re-add the newline readline stripped. pipeline() wires it together and gives you a single promise that rejects on any error in any stage.

If disk fills up halfway through writing access.redacted.log, pipeline rejects, the readable is destroyed, the file descriptor closes, and you get a real stack trace. With .pipe() you’d get a silently-truncated output file and a process that thinks it succeeded.

Prefer async iteration to a Transform class when the logic is naturally imperative:

for await (const line of rl) {
  output.write(line.replace(/\b\d{1,3}(\.\d{1,3}){3}\b/g, 'x.x.x.x') + '\n');
}

Same backpressure — for await honors output.write()’s return value via the writable’s internal queue. Different shape. Pick whichever reads better to the person reviewing your PR.

Heap profile on a 4GB input: ~38MB from start to finish. Doesn’t matter if it’s 4GB or 400GB. The line is flat.

Files on disk are the easy case. What about a stream you don’t own — one coming over the network?

Pattern 2: HTTP streaming proxy that transforms the body in-flight

Your service proxies a 1.2GB JSON-lines export from an internal API to a client (the fetch streaming patterns and AbortController piece covers the client side). You need to strip internal fields on the way through. You cannot buffer the response — the upstream finishes in two minutes and your client will time out long before that.

import { pipeline } from 'node:stream/promises';
import { Readable, Transform } from 'node:stream';
import { createInterface } from 'node:readline';

app.get('/export', async (req, res) => {
  const upstream = await fetch('https://internal/api/export');
  const body = Readable.fromWeb(upstream.body);

  const stripInternal = new Transform({
    writableObjectMode: true,
    transform(line, _enc, cb) {
      const row = JSON.parse(line);
      delete row.internal_notes;
      delete row.cost_basis;
      cb(null, JSON.stringify(row) + '\n');
    }
  });

  req.on('close', () => upstream.body.cancel());
  await pipeline(createInterface({ input: body }), stripInternal, res);
});

Two production landmines worth knowing. First, don’t set Content-Length — Node uses chunked transfer encoding by default, which is what you want when you don’t know the size up front. Set it and you’ll truncate the response. Second, listen for req.on('close') and cancel the upstream when the client disconnects. Without that, you’ll keep pulling bytes from the upstream API after the client is long gone — burning bandwidth and quota.

Readable.fromWeb is the bridge between Web Streams (what fetch gives you) and Node Streams (what pipeline expects). In 2026 you’ll see both APIs in the same codebase. Don’t rewrite working Node streams to chase the Web Streams API — use the bridge. There’s a Node.js 26 features piece if you want the longer take on which API is going where.

Heap stays around ~45MB across the full 1.2GB transfer. Two concurrent clients streaming at the same time: ~80MB. Still flat. Still boring.

Line-by-line transforms are the easy network case. The hard one is when a “record” isn’t a line — when you need to parse structure and aggregate as you go.

Pattern 3: Parse and aggregate a multi-GB CSV without loading it

Six gigabytes of orders.csv. Group by customer, sum the totals, write a summary JSON of about 50,000 customers.

import { createReadStream } from 'node:fs';
import { writeFile } from 'node:fs/promises';
import csv from 'csv-parser';

const totals = new Map();

for await (const row of createReadStream('orders.csv').pipe(csv())) {
  totals.set(row.customer_id, (totals.get(row.customer_id) ?? 0) + Number(row.order_total));
}

await writeFile('summary.json', JSON.stringify(Object.fromEntries(totals)));

That’s it. Async iteration over the parsed stream, accumulate into a Map, write the summary. Backpressure still works — for await waits for the loop body to finish before pulling the next row, and the parser waits for the file stream to feed it. The whole chain stays balanced.

Why a for await loop instead of a Transform stream here? Aggregation is naturally imperative. Constructing a stateful Transform that holds a Map and emits the summary on flush is technically equivalent and harder to read. The for-loop is what you want.

Be honest about the memory math: the file isn’t in memory, but the aggregate is. 50,000 customers at ~50 bytes per entry is ~2.5MB. That’s fine. If your aggregate has 50 million unique keys, streams don’t save you — you’re back to needing SQLite, DuckDB, or a real database to spill state to disk. Streams handle unbounded input. They don’t fix unbounded state.

Heap on a 6GB input with the 50K-key aggregate: ~40MB. Compare to the buffered alternative — JSON.parse(await fs.readFile(...)) would need roughly 12GB and crash before you finish typing it.

Three patterns, three flat memory graphs. So when is the answer not “use a stream?”

When not to use streams (and the .pipe() trap)

Streams have a tax. They add a layer of indirection, the error handling shape is different, and the chunk-boundary thinking is real cognitive load. Pay the tax when it buys you something. Don’t pay it for files that fit in memory.

My rule: under ~10MB, await fs.readFile() is faster to write, faster to read, faster to debug, and the OOM risk is zero. Streams are not a virtue. They’re a tool for a specific shape of problem.

Don’t use streams when you need random access either. Streams are sequential — once a chunk passes, it’s gone. If you need to seek (read bytes 1024-2048, then bytes 50-100, then bytes 9000-9500), use a file descriptor with fs.read(fd, buffer, offset, length, position). Streams aren’t built for that motion.

And don’t use .pipe() in production. Ever. Three reasons in three lines: errors on the source don’t propagate to the destination, the destination isn’t destroyed on source error, and a failed write gives you a silently partial output file — a class of error handling patterns that try/catch misses that’s worth understanding broadly. Always await pipeline(src, ...transforms, dst) from node:stream/promises. The legacy .pipe() shape exists because the ecosystem predates promises — it’s not a recommendation, it’s history.

One bonus, for Node 22+: readable.map(), .filter(), .reduce() let you skip writing a Transform for the one-liner cases — the same lazy-evaluation shape as lazy iterator helpers at the language level. Nice for readable.filter(line => line.includes('ERROR')). Stateful work still wants a real Transform or a for await loop.

The bottom line

You started with a readFile that crashed prod. The fix is one await pipeline(...) call from node:stream/promises, one Transform (or for await) per stage, and trust that the highWaterMark is doing its job. That’s it. That’s the whole shape.

Do this Monday: pick whichever of the three patterns matches your problem, copy the snippet, and log process.memoryUsage().heapUsed once per second during a real run. Watch the line. It goes flat — flat across a 4GB input, flat across a 40GB input, flat across whatever you throw at it. That moment, when streams stop feeling magical and start feeling boring, is exactly what you want. Boring code stays up.

Streams have a reputation for being the confusing corner of Node. They aren’t. They’re three lines once you know the shape. The confusion comes from tutorials that lead with .pipe('hello.txt'). You’re past that now.