(code is entirely generated by AI.)Preserve non-UTF-8 query values when cleaning fields#520
Open
dongfengweixiao wants to merge 1 commit into
Open
Conversation
fe1267b to
1d34062
Compare
Query parameter values that use a non-UTF-8 encoding (e.g. GBK on 1688.com, Big5, Shift-JIS) were being corrupted whenever ClearURLs rewrote a URL. Searching "usb转串口" on 1688 turned keywords=usb%D7%AA%B4%AE%BF%DA into keywords=usb%D7%AA%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD. Root cause: removeFieldsFormURL round-tripped every field through URLSearchParams, which decodes percent-encoded bytes as UTF-8. Bytes that are invalid UTF-8 (the high bytes of GBK-encoded text) became the U+FFFD replacement character, which then re-encodes as %EF%BF%BD. Fix: filter the raw query string byte-for-byte via the new removeFieldsFromQuery(), which decodes only the parameter keys for rule matching and never decodes or re-encodes the values. Values that are not themselves removed tracking fields now survive with their exact original byte sequence, so GBK/Big5/Shift-JIS values are no longer damaged. - Add percentDecodeBytes() and removeFieldsFromQuery() in core_js/tools.js - Switch removeFieldsFormURL to operate on the raw query string - Remove the now-unused urlSearchParamsToString() Fragments, raw rules, redirections, logging, and change detection are unchanged. Co-Authored-By: Claude <noreply@anthropic.com>
1d34062 to
156813f
Compare
|
Author
|
@wxy 如果有时间是否可以帮忙看下这个bug,及ai生成的代码是否能够解决该问题? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Query parameter values that use a non-UTF-8 encoding (e.g. GBK on 1688.com, Big5, Shift-JIS) were being corrupted whenever ClearURLs rewrote a URL. Searching "usb转串口" on 1688 turned
keywords=usb%D7%AA%B4%AE%BF%DA into
keywords=usb%D7%AA%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD.
Root cause: removeFieldsFormURL round-tripped every field through URLSearchParams, which decodes percent-encoded bytes as UTF-8. Bytes that are invalid UTF-8 (the high bytes of GBK-encoded text) became the U+FFFD replacement character, which then re-encodes as %EF%BF%BD.
Fix: filter the raw query string byte-for-byte via the new removeFieldsFromQuery(), which decodes only the parameter keys for rule matching and never decodes or re-encodes the values. Values that are not themselves removed tracking fields now survive with their exact original byte sequence, so GBK/Big5/Shift-JIS values are no longer damaged.
Fragments, raw rules, redirections, logging, and change detection are unchanged.
#398