Skip to content

Core, REST: Fix path segment encoding to use RFC 3986 percent-encoding#15989

Open
adutra wants to merge 1 commit intoapache:mainfrom
adutra:fix-path-segment-encoding
Open

Core, REST: Fix path segment encoding to use RFC 3986 percent-encoding#15989
adutra wants to merge 1 commit intoapache:mainfrom
adutra:fix-path-segment-encoding

Conversation

@adutra
Copy link
Copy Markdown
Contributor

@adutra adutra commented Apr 15, 2026

This PR introduces new methods in RESTUtil:

  • encodePathSegment/decodePathSegment
  • encodeNamespaceAsPathSegment/decodeNamespaceAsPathSegment

They use RFC 3986 percent-encoding (spaces as %20) instead of application/x-www-form-urlencoded (spaces as +).

This PR also switches ResourcePaths to use the new path-segment methods, thus fixing the encoding issue on the client (encoding) side.

The decoding methods are free to be used on the server side; servers should switch to them when they can/want to support the new, correct encoding. The REST TCK harness has been updated already.

This PR introduces new methods in `RESTUtil`:

- `encodePathSegment`/`decodePathSegment`
- `encodeNamespaceAsPathSegment`/`decodeNamespaceAsPathSegment`

They use RFC 3986 percent-encoding (spaces as `%20`) instead of `application/x-www-form-urlencoded` (spaces as `+`).

This PR also switches `ResourcePaths` to use the new path-segment methods, thus fixing the encoding issue on the client (encoding) side.

The decoding methods are free to be used on the server side; servers should switch to them when they can/want to support the new, correct encoding. The REST TCK harness has been updated already.
@adutra
Copy link
Copy Markdown
Contributor Author

adutra commented Apr 16, 2026

A few data points:

In the range of 0 to 32767, there are only 3 characters that encodeString and encodePathSegment encode differently:

Character Code encodeString encodePathSegment
32 + %20
* 42 * %2A
~ 126 %7E ~

In particular the + sign is encoded equally:

Character Code encodeString encodePathSegment
+ 43 %2B %2B

And there is only one character that fails a crossed round trip, the space char (32):

Character Code Round Trip Result
32 decodePathSegment(encodeString(' ')) +

This shows evidence that:

  1. There is no behavioral regression when a new client (encodePathSegment) talks to an old server (decodeString).
  2. There is a behavioral regression when an old client (encodeString) talks to a new server (decodePathSegment): any space will be interpreted by the server as the + sign.

That's why it's safe to upgrade clients right away, but servers should only upgrade when they feel it's safe to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant