[servlet-dev] Spec section 3.5.2 item 8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[servlet-dev] Spec section 3.5.2 item 8

From: Laird Nelson <ljnelson@xxxxxxxxx>
Date: Fri, 25 Oct 2024 13:00:03 -0700
Delivered-to: servlet-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/servlet-dev/>
List-help: <mailto:servlet-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/servlet-dev>, <mailto:servlet-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/servlet-dev>, <mailto:servlet-dev-request@eclipse.org?subject=unsubscribe>

I had a question around (logical) URI path canonicalization, particularly around the close-to-the-end step of joining segments back together again.

Section 3.5.2 item 8 of the Servlet 6.1 specification says, in part:

"If a segment contains the "/" or "%" characters, and the container is configured to not reject the request for containing an encoded "/", then the container should re-encode those characters to the %nn form. If any characters are re-encoded, then the "%" must also be re-encoded."

I had trouble reading this, particularly the very dense "configured to not reject the request for containing an encoded '/'" part. I wanted to make sure I understood it properly.

Does this excerpt actually mean, for a given (decoded) segment, S, at this step in the logical canonicalization process as described by the specification:

If S contains no occurrences of either "/" or "%" characters, the container MUST take no further action.
(S contains occurrences of "/" and/or "%" characters.) If the container permits any (decoded) segment to contain "/" characters, whether S does or not, the container MUST take no further action.
(The container forbids "/" characters in any (decoded) segment.) For all occurrences in S of a character, c, that is either "/" or "%":

If c is "/", the request will be ultimately rejected. Nevertheless, processing continues.
The container SHOULD encode c to its %nn form and replace c in S with its %nn form (%2F or %25), resulting in S'
If the container performs the prior re-encoding step, then the replacement of c in S' will start with "%". The container MUST encode this "%" to its %nn form (%25) and replace this "%" in S' with its %nn form.

Section 3.5.3 (a table of valid and rejected decoded paths) indicates that (in the 12th row) "/foo/b%25r" will (always?) be decoded to "/foo/b%r".

However, walking through it, if I have understood it properly:

S is "b%25r".
Step 1 does not apply (S contains a "%" character.).
Step 2 might apply, or might not. If and only if it does, the result will be "b%r", the example's decoded path.

If Step 2 does not apply, Step 3.1 might or might not apply, depending on what the container wants to do.

If Step 3.1 did apply, then the result will be "b%2525r".

The example seems to imply that somehow the result will always be "b%r". But it could be "b%2525r", right, based off the container's configuration?

Thanks,

Laird

Follow-Ups:
- Re: [servlet-dev] Spec section 3.5.2 item 8
  - From: Greg Wilkins

Prev by Date: Re: [servlet-dev] Plan for TCK 6.1.1
Next by Date: Re: [servlet-dev] Spec section 3.5.2 item 8
Previous by thread: [servlet-dev] Plan for TCK 6.1.1
Next by thread: Re: [servlet-dev] Spec section 3.5.2 item 8
Index(es):
- Date
- Thread

Breadcrumbs