Skip to content

Conversation

@rousskov
Copy link
Contributor

In recent commit ff60b10, many AnyP::Uri::path() calls were replaced
with new absolutePath() and originForm() method calls, and those two
methods started to return percent-encoded URI suffixes1. Encoding URI
suffixes broke forwarding of requests that use ? characters (i.e.
requests that have a query part in their target). For example,
http://example.com/c?p=v request target was forwarded as
http://example.com/c%3Fp=v. Other cases were probably broken as well.

This change removes all encoding-related changes that were added in
commit ff60b10, effectively reducing that commit to differentiation of
many AnyP::Uri::path() use cases as originForm() or absolutePath()
cases. More percent-encoding work is needed, as hinted by an old TODO.

This change also removes Ftp::Gateway::decodedRequestUriPath() added in
recent commit ad365ed because the corresponding code was necessary to
decode the now-removed encoding added in problematic commit ff60b10.

This surgical fix does not solve all commit ff60b10 problems.

Footnotes

  1. Here, "URI suffix" is a path [ "?" query ] [ "#" fragment ]
    portion of a URI in RFC 3986 terms. The RFC does not name that portion.
    Many fixed cases involve ? that separates path from query.

In recent commit ff60b10, `AnyP::Uri::absolute()` and related methods
started to percent-encode URI paths and query parts. Encoding query
parts breaks forwarding of requests that use `?` characters (i.e.
requests that have a query part in their target). For example,
`http://example.com/c?p=v` is forwarded as
`http://example.com/c%3Fp=v`.

This change removes all encoding changes added in commit ff60b10. More
work is needed to percent-encode correctly, as hinted by an old TODO.
    Uri.cc:55: `const CharacterSet& PathChars()` defined but not used
... that were removed in the same problematic recent commit ff60b10.

That commit probably removed these unescaping calls to preserve code
logic after adding escaping to `AnyP::Uri::path()`. This branch removes
the latter, so it has to restore those unescaping calls to preserve code
logic that existed before that commit ff60b10.

I have not checked whether the following equality is always true. If it
is not true for some `x` values then commit ff60b10 probably also
changed some `urlpath_regex` ACL matches and FTP transactions.

    rfc1738_unescape(Uri::Encode(x, PathChars())) == x
The method was added in recent commit ad365ed to safely decode URI
paths that started to be encoded in problematic commit ff60b10. This
branch removes that problematic encoding, so we must remove the decoding
step as well to preserve code logic.
The (arguably misplaced) comment only applied to
decodedRequestUriPath() code that this branch removes. The restored
rfc1738_unescape() code does not decode %00 as NUL AFAICT.
rfc1738_unescape(esc_buf);
const auto result = data->match(esc_buf);
xfree(esc_buf);
return result;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const and AAA aside, these changes restore pre ff60b10 code. Polishing that code is out of this PR scope, and I will remove the remaining cosmetic improvements if requested.

ftpState->filepath = SBufToCstring(ftpState->decodedRequestUriPath().value());
const auto path = SBufToCstring(ftpState->request->url.absolutePath());
rfc1738_unescape(path);
ftpState->filepath = path;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty lines removal, const/AAA in path declaration, and some comment movements aside, these changes restore pre ff60b10 code. Polishing this code is out of this PR scope, and I will remove the remaining cosmetic improvements if requested.

@rousskov rousskov added the S-waiting-for-more-reviewers needs a reviewer and/or a second opinion label Oct 29, 2025
Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good fix.

  1. code that needs to use the un-encoded URI path needs to call AnyP::Uri::path() instead of absolutePath()

  2. as a temporary fix for the encoding one should alter the implementation of absolutePath() to split the path and query sections then encode those sub-strings separately. Leaving the '?' between them.

@rousskov
Copy link
Contributor Author

This is not a good fix.

  1. code that needs to use the un-encoded URI path needs to call AnyP::Uri::path() instead of absolutePath()

If you insist on establishing or maintaining the above undocumented(?) invariant, then we should revert the entire problematic commit ff60b10. That reversal cannot be done cleanly (i.e. some manual work would be required to resolve conflicts), but it can be done safely. Do you want me to post the corresponding PR reversing ff60b10?

  1. as a temporary fix for the encoding one should alter the implementation of absolutePath() to split the path and query sections then encode those sub-strings separately. Leaving the '?' between them.

Such a "temporary fix" is not acceptable because it requires new dangerous parsing changes while leaving unaddressed serious dangers of other commit ff60b10 changes (e.g., that some of the two dozen absolutePath() callers introduced in that commit should be calling path() instead or vice versa).

We need to restore status quo before we can move forward. This PR does that while preserving path() callers split introduced by ff60b10 (because it is safe to do so when all methods return the same thing). Or we can revert that commit entirely, removing that split. What is your preference?

@rousskov rousskov requested a review from yadij October 30, 2025 14:04
@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-more-reviewers needs a reviewer and/or a second opinion labels Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants