Field Note
When %25%3F is not double-encoded
An Astro upgrade from 6.3.1 to 6.3.5 broke every dynamic route that received an encodeURIComponent-produced filename. The handler at pages/uploads/[name].ts stopped seeing the request. The user got a 400 instead.
The breaking change was PR #16556, a security fix that landed in 6.3.2. It added a multi-level encoding check to validateAndDecodePathname. The check rejected anything that decoded into a different string and still contained %XX sequences. That second condition is the trap. It catches a real attack and a perfectly normal request with the same net.
Two shapes that look the same after decoding
Consider two URL paths.
The attack. /api/%2561dmin/users. The bytes %25 are a percent-encoded percent sign. 61 is a hex pair that, when prefixed by a single %, encodes the lowercase letter a. So this path decodes to /api/%61dmin/users, which still contains %XX, and a subsequent decoder turns it into /api/admin/users. If your auth middleware ran against the literal /api/%2561dmin path and didn't recognize it as admin, you have an authorization bypass. This is the documented case the patch was written to catch.
The false positive. /uploads/%25%3F.pdf. The bytes %25 are again a percent-encoded percent sign. %3F is an encoded question mark, a reserved character. This is what encodeURIComponent('%?.pdf') emits. It decodes to /uploads/%%3F.pdf, which still contains %XX, and a subsequent decoder would turn it into /uploads/%?.pdf. There is no second layer here. There is a literal percent sign next to other encoded content.
The two requests are not the same. One is an attempt to hide a payload behind two layers of encoding. The other is a normal filename that happens to contain a percent sign and a question mark, both of which need to be encoded once to travel in a URL. After a single decode pass, though, the two are byte-for-byte similar: both produce strings that contain %XX. The check that fired on the second condition couldn't tell them apart.
Why decode-and-compare fails to discriminate
The original logic was, in effect:
const decoded = decodeURI(pathname);
if (decoded !== pathname && /%[0-9a-fA-F]{2}/.test(decoded)) {
throw new MultiLevelEncodingError();
}
Two things are checked. The first, that decoding produced a different string, just means the path contains at least one %XX sequence. The second, that %XX survives into the post-decode string, is the multi-level-encoding signal. Or it is supposed to be.
The post-decode string is downstream of where the discriminator lives. By the time you reach it, %2561 and %25%3F have both produced strings of the form %(something), and there is no surviving record of whether that % sat next to two hex digits in the input or next to another %. The discriminator was in the input. The check looked at the output.
The signature is in the source
Look at the input directly. A multi-level encoded sequence has a tight signature: a literal %25 followed by two hex digits. That is the shape that, after one decode, becomes %XX that a second decoder can resolve to a byte. Anything else after the %25 is a literal percent. A %, an X, a colon, end-of-string. Any byte that is not [0-9a-fA-F].
if (/%25[0-9a-fA-F]{2}/.test(pathname)) {
throw new MultiLevelEncodingError();
}
One regex. No decode pass. The check fires on the attack shape and stays silent on the false positive. %2561 matches. %25%3F doesn't, because % is not a hex digit. %25dmin doesn't, because d reads as a single hex digit but m doesn't, so the two-digit window doesn't close. The signature is the thing.
The two-step check the regression was sitting in is gone. So is the dependency on what decodeURI decided about the input. The discriminator looks at the bytes that were sent.
The general shape
Detection-by-decoding works only when the encoded form maps to a unique decoded form. URL percent-encoding is not in that family. %25%3F and %2561 are different encodings of different intents that, after one decode pass, occupy adjacent regions of the same shape. If you ask the post-decode string what its source was, it cannot say.
The fix pattern is general. When you are trying to recognize a class of input (multi-level encoded, polyglot, normalization-variant, escape-bypass), the check belongs upstream of the operation that erases the discriminator. Either before any decode, or against the original bytes preserved alongside the canonicalized form. If you find yourself decoding to detect what should have been detectable before the decode, the input signature was the right place.
The other lesson is about the second condition in a compound check. The Astro check was different-after-decode AND contains-%XX-after-decode. Both conditions are true for the attack and for the false positive. A condition that fires on both classes is not a discriminator; it is a filter that lets through everything below it. Two filters that catch the same set are one filter. The fix replaces both with a single condition that fires on the attack and not on the false positive.
Takeaway
When detecting multi-level encoding, look at the pre-decode signature. %25 followed by two hex digits is the only shape that produces another decodable %XX after one pass. Everything else with a %25 in it is a literal percent next to other encoded content, and those are routine in user input.
Compound checks where every condition fires on every class you care about are not discriminating. They are accepting. When a security check produces false positives, look first at whether the conditions actually distinguish the classes the check is trying to separate.