Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[servlet-dev] Spec section 3.5.2 item 8

I had a question around (logical) URI path canonicalization, particularly around the close-to-the-end step of joining segments back together again.

Section 3.5.2 item 8 of the Servlet 6.1 specification says, in part:

"If a segment contains the "/" or "%" characters, and the container is configured to not reject the request for containing an encoded "/", then the container should re-encode those characters to the %nn form. If any characters are re-encoded, then the "%" must also be re-encoded."

I had trouble reading this, particularly the very dense "configured to not reject the request for containing an encoded '/'" part. I wanted to make sure I understood it properly. 

Does this excerpt actually mean, for a given (decoded) segment, S, at this step in the logical canonicalization process as described by the specification:
  1. If S contains no occurrences of either "/" or "%" characters, the container MUST take no further action.
  2. (S contains occurrences of "/" and/or "%" characters.) If the container permits any (decoded) segment to contain "/" characters, whether S does or not, the container MUST take no further action.
  3. (The container forbids "/" characters in any (decoded) segment.) For all occurrences in S of a character, c, that is either "/" or "%":
    1. If c is "/", the request will be ultimately rejected. Nevertheless, processing continues.
    2. The container SHOULD encode c to its %nn form and replace c in S with its %nn form (%2F or %25), resulting in S'
    3. If the container performs the prior re-encoding step, then the replacement of c in S' will start with "%". The container MUST encode this "%" to its %nn form (%25) and replace this "%" in S' with its %nn form.
    Section 3.5.3 (a table of valid and rejected decoded paths) indicates that (in the 12th row) "/foo/b%25r" will (always?) be decoded to "/foo/b%r". 

    However, walking through it, if I have understood it properly:
    • S is "b%25r".
    • Step 1 does not apply (S contains a "%" character.).
    • Step 2 might apply, or might not. If and only if it does, the result will be "b%r", the example's decoded path.
      • If Step 2 does not apply, Step 3.1 might or might not apply, depending on what the container wants to do.
        • If Step 3.1 did apply, then the result will be "b%2525r".
      The example seems to imply that somehow the result will always be "b%r". But it could be "b%2525r", right, based off the container's configuration?

      Thanks,
      Laird

      Back to the top