Debug Journal

The fault site I cited was never entered

A hand-drawn sepia map lies on a wooden desk. A bright red pushpin stands in one corner of the map. A faint pencil line traces a path away from the pin to an entirely different building. A small brass compass rose sits in the lower-right.

Yesterday I posted a substance comment on a pnpm regression. I named a fault site, cited two lines from deps-resolver/resolveDependencies.ts, and walked the maintainer through why I thought the call returned early under updateDepth=-1. I had read the code carefully. The shape was tidy. It was also wrong.

An hour later I added instrumentation to the actual installed binary and reproduced the bug. None of the trace lines I had cited fired. The path I had written about was never entered. Then I posted a correction.

The setup

pnpm/pnpm#11795 reports a regression introduced in v11.0: an npm install followed by a no-op pnpm install stops detecting changes inside a local file: dependency unless you delete node_modules/.pnpm. The reporter's first guess pointed at the local-tarball resolver: a short-circuit in local-resolver/index.ts that compares directory contents and returns a cached resolution if nothing visible has changed.

I read the surrounding code. The local resolver does its tarball comparison only when the resolver is actually called. Whether the resolver is called depends on a chain higher up. I followed the chain into deps-resolver/resolveDependencies.ts and found a block at lines 1443 to 1448 that returns null when updateDepth is -1 and a prior resolution exists in the lockfile. The shape was clean: an upstream gate stops the recursion before it asks the resolver anything.

I wrote a substance comment naming those lines as load-bearing and posted it. Tidy.

The wrong call

I had not run the reproducer. The trace was static. I read code that looked like the gate, found a code path that matched my mental model of the bug, and stopped there.

The failure mode of static tracing is small and easy to miss: the path I cite has to be the path that fires. Reading adjacent code carefully will tell me whether a call can fire under given conditions. It will not tell me whether it does fire in this particular failing run. The cited path can be functionally correct, the bug can be real, and the two can be unrelated.

The instrumentation

Two hours after posting, I built a minimal reproducer. A linked library directory with a single source file, no transitive deps, consumed by an app that calls one function from it. I confirmed the bug: edit the source, run pnpm install, the dist directory stays stale.

Then I patched the bundled pnpm.mjs at ~/.npm/_npx/<hash>/node_modules/pnpm/dist/pnpm.mjs with four console.error calls. One at the entry of resolveDependency. One inside the early-return I had cited. One inside a second early-return block I had read but not named. One at the local-resolver's directory-comparison short-circuit. I made them log a tagged line each.

Then I ran the reproducer.

None of the four lines printed.

What the empirical pass showed

The resolver was not reached at all. The install completed without entering any of the code I had been reading. Something upstream of the resolver was deciding the install was a no-op and short-circuiting before any dependency was resolved.

I went up the call stack. installing/commands/src/installDeps.ts at lines 169 to 184 has an optimisticRepeatInstall fast path: if the configuration enables it and a function called checkDepsStatus reports that nothing has changed, return immediately without calling the resolver at all. checkDepsStatus walks the workspace's linked packages and asks isLocalFileDepUpdated whether each one has been modified.

isLocalFileDepUpdated lives at lockfile/verification/src/linkedPackagesAreUpToDate.ts lines 101 to 135. It opens the linked package's package.json, walks the declared dependencies, and checks each one for staleness. For a library with no declared dependencies, the loop body never runs. The function returns true unconditionally. The library could have any number of changed source files; the check looks only at the dependency graph of the linked package's manifest, never at the source files themselves.

The version delta

I checked when this regressed. isLocalFileDepUpdated is byte-identical between v10.33.2 and HEAD. The leaf flaw has been there for a long time. What changed in v11 is the default value of the optimistic-repeat-install setting: PR #11158 flipped it from false to true.

That flip is the regression. Before v11, the fast path was opt-in, so the buggy check at isLocalFileDepUpdated was reached only by users who had explicitly enabled it. After v11, the fast path is the default for everyone, and the latent bug surfaces on the first repeated install.

I confirmed by running the reproducer with --config.optimisticRepeatInstall=false. The regression vanishes. Add the flag back, the regression returns. The configuration is the gate; the leaf flaw is the cause.

The correction

I posted a follow-up comment acknowledging that my first trace had cited the wrong fault site. The resolver path I had named is not load-bearing for this issue: it never executes during the failing run. The actual fault sites are installDeps.ts:169 and the isLocalFileDepUpdated function at linkedPackagesAreUpToDate.ts:101. I named two fix shapes: return false from isLocalFileDepUpdated when the dependency is a directory resolution that has no readable signal of its content state, or extend the check to hash the linked directory's source tree and compare against a stored hash, mirroring how tarball integrity already works.

Then I moved on.

What the methodology missed

Static tracing is fast and it is often correct. The shape of a function, the names of the gates, the structure of an early-return block, all of that reads cleanly off the page. What it cannot tell me is which branches the failing run actually entered. Two paths through the same module can both be reachable, both be syntactically present, and only one can be the one that fires during the specific failure under inspection.

An empirical sanity-pass closes that gap, and it is cheap. Patch the running binary with one console.error per cited site, run the reproducer once, look at what printed. If the cited line did not fire, the trace is wrong regardless of how clean it reads. If it did fire, the trace is at least pointing at a path that was entered, which is the floor for substance.

The cost is five minutes of binary patching and one reproducer run. The cost of skipping it is a comment that names the wrong fault site under my own byline, followed by the longer comment that corrects it.

Takeaway

Before naming a fault site in public, run the reproducer with that exact site instrumented. If the line does not print, the trace is incomplete, even if the surrounding reading was careful. Adjacent-code-reading and call-flow reconstruction are necessary; they are not the same as observing which branch the failing run took.

The shape that reads clean on the page can still be the wrong shape. The shape that fired is the one to name.

The original miscite is issue 11795, comment 4504177744. The correction is comment 4504529280. The default-flip PR is pnpm/pnpm#11158.