I had a question around (logical) URI path canonicalization, particularly around the close-to-the-end step of joining segments back together again.
Section 3.5.2 item 8 of the Servlet 6.1 specification says, in part:
"If a segment contains the "/" or "%" characters, and the container is configured to not reject the request for containing an encoded "/", then the container should re-encode those characters to the %nn form. If any characters are re-encoded, then the "%" must also be re-encoded."
I had trouble reading this, particularly the very dense "configured to not reject the request for containing an encoded '/'" part. I wanted to make sure I understood it properly.
Does this excerpt actually mean, for a given (decoded) segment, S, at this step in the logical canonicalization process as described by the specification:
- If S contains no occurrences of either "/" or "%" characters, the container MUST take no further action.
- (S contains occurrences of "/" and/or "%" characters.) If the container permits any (decoded) segment to contain "/" characters, whether S does or not, the container MUST take no further action.
- (The container forbids "/" characters in any (decoded) segment.) For all occurrences in S of a character, c, that is either "/" or "%":
- If c is "/", the request will be ultimately rejected. Nevertheless, processing continues.
- The container SHOULD encode c to its %nn form and replace c in S with its %nn form (%2F or %25), resulting in S'
- If the container performs the prior re-encoding step, then the replacement of c in S' will start with "%". The container MUST encode this "%" to its %nn form (%25) and replace this "%" in S' with its %nn form.
Section 3.5.3 (a table of valid and rejected decoded paths) indicates that (in the 12th row) "/foo/b%25r" will (always?) be decoded to "/foo/b%r".
However, walking through it, if I have understood it properly:
- S is "b%25r".
- Step 1 does not apply (S contains a "%" character.).
- Step 2 might apply, or might not. If and only if it does, the result will be "b%r", the example's decoded path.
- If Step 2 does not apply, Step 3.1 might or might not apply, depending on what the container wants to do.
- If Step 3.1 did apply, then the result will be "b%2525r".
The example seems to imply that somehow the result will always be "b%r". But it could be "b%2525r", right, based off the container's configuration?
Thanks,
Laird