-
Notifications
You must be signed in to change notification settings - Fork 597
Bug 5523: Do not percent-encode URI suffixes #2289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Bug 5523: Do not percent-encode URI suffixes #2289
Conversation
In recent commit ff60b10, `AnyP::Uri::absolute()` and related methods started to percent-encode URI paths and query parts. Encoding query parts breaks forwarding of requests that use `?` characters (i.e. requests that have a query part in their target). For example, `http://example.com/c?p=v` is forwarded as `http://example.com/c%3Fp=v`. This change removes all encoding changes added in commit ff60b10. More work is needed to percent-encode correctly, as hinted by an old TODO.
Uri.cc:55: `const CharacterSet& PathChars()` defined but not used
... that were removed in the same problematic recent commit ff60b10. That commit probably removed these unescaping calls to preserve code logic after adding escaping to `AnyP::Uri::path()`. This branch removes the latter, so it has to restore those unescaping calls to preserve code logic that existed before that commit ff60b10. I have not checked whether the following equality is always true. If it is not true for some `x` values then commit ff60b10 probably also changed some `urlpath_regex` ACL matches and FTP transactions. rfc1738_unescape(Uri::Encode(x, PathChars())) == x
The (arguably misplaced) comment only applied to decodedRequestUriPath() code that this branch removes. The restored rfc1738_unescape() code does not decode %00 as NUL AFAICT.
| rfc1738_unescape(esc_buf); | ||
| const auto result = data->match(esc_buf); | ||
| xfree(esc_buf); | ||
| return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const and AAA aside, these changes restore pre ff60b10 code. Polishing that code is out of this PR scope, and I will remove the remaining cosmetic improvements if requested.
| ftpState->filepath = SBufToCstring(ftpState->decodedRequestUriPath().value()); | ||
| const auto path = SBufToCstring(ftpState->request->url.absolutePath()); | ||
| rfc1738_unescape(path); | ||
| ftpState->filepath = path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty lines removal, const/AAA in path declaration, and some comment movements aside, these changes restore pre ff60b10 code. Polishing this code is out of this PR scope, and I will remove the remaining cosmetic improvements if requested.
yadij
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good fix.
-
code that needs to use the un-encoded URI path needs to call
AnyP::Uri::path()instead ofabsolutePath() -
as a temporary fix for the encoding one should alter the implementation of
absolutePath()to split the path and query sections then encode those sub-strings separately. Leaving the '?' between them.
If you insist on establishing or maintaining the above undocumented(?) invariant, then we should revert the entire problematic commit ff60b10. That reversal cannot be done cleanly (i.e. some manual work would be required to resolve conflicts), but it can be done safely. Do you want me to post the corresponding PR reversing ff60b10?
Such a "temporary fix" is not acceptable because it requires new dangerous parsing changes while leaving unaddressed serious dangers of other commit ff60b10 changes (e.g., that some of the two dozen absolutePath() callers introduced in that commit should be calling path() instead or vice versa). We need to restore status quo before we can move forward. This PR does that while preserving path() callers split introduced by ff60b10 (because it is safe to do so when all methods return the same thing). Or we can revert that commit entirely, removing that split. What is your preference? |
In recent commit ff60b10, many
AnyP::Uri::path()calls were replacedwith new
absolutePath()andoriginForm()method calls, and those twomethods started to return percent-encoded URI suffixes1. Encoding URI
suffixes broke forwarding of requests that use
?characters (i.e.requests that have a
querypart in their target). For example,http://example.com/c?p=vrequest target was forwarded ashttp://example.com/c%3Fp=v. Other cases were probably broken as well.This change removes all encoding-related changes that were added in
commit ff60b10, effectively reducing that commit to differentiation of
many AnyP::Uri::path() use cases as originForm() or absolutePath()
cases. More percent-encoding work is needed, as hinted by an old TODO.
This change also removes Ftp::Gateway::decodedRequestUriPath() added in
recent commit ad365ed because the corresponding code was necessary to
decode the now-removed encoding added in problematic commit ff60b10.
This surgical fix does not solve all commit ff60b10 problems.
Footnotes
Here, "URI suffix" is a
path [ "?" query ] [ "#" fragment ]portion of a URI in RFC 3986 terms. The RFC does not name that portion.
Many fixed cases involve
?that separatespathfromquery. ↩