XALANJ-2617 Fixed serializer for high-surrogate UTF-16 characters#4
XALANJ-2617 Fixed serializer for high-surrogate UTF-16 characters#4peterdemaeyer wants to merge 3 commits intoapache:trunkfrom
Conversation
…urrogate UTF-16 characters
|
Thanks for your pull request. I build serializer.jar with it and it works well with normal text. But I still have problems with text in attributes. XML (content of element): XSL: Output (method xml; encoding utf-8): Output (method html, encoding utf-8): With encoding utf-16 xml and html: |
|
I'll gladly investigate further and fix, but I'm afraid the description of your reproducible scenario is a bit too concise for me to understand it properly. I understand attributes in XML, but I don't understand what the XSL has to do with it. It would help if you could give me more detailed steps to reproduce. |
|
Discard my previous comment. In the meantime I was able to reproduce the bug and I fixed it for attribute values as well. The fix for attribute values was a bit more involved than that for element text, because in general the method |
…o for attribute values
|
Thanks for your response and fix. It now work well with attributes when the output is xml. When I change the OutputKeys.METHOD to 'html' text in attributes is still broken. The character is not a html entity and there is extra code '�' (�). Output of file xalan_emoji.txt (changed from .java to .txt) with your fixed xalan version: |
|
I tried to investigate the problem and found This seems to work, the output of my test (see comment above) is now: The character is not changed to an entity, but for me it is better than an extra and wrong character. |
|
Hi, Can we hope a release with this bug fixed or their is a non-official maven repo available with this issue fixed ? |
|
@peterdemaeyer thanks for this fix, do you know when this can be merged and released ? |
|
My pull request has been ready for several months. I didn't closely examine @nicolas-albert's additional fixes but assuming they are good, this can be merged as far as I'm concerned. I don't think I can merge it myself, so I was just waiting for someone with sufficient privileges to merge it. I'm also quite eager to get this fix in an official release, so if I can/need to do something additional myself, let me know. |
|
@peterdemaeyer I didn't customize the code, I only applied this patch to my project (and seems ok). |
|
Hi @peterdemaeyer with reference to https://issues.apache.org/jira/browse/XALANJ-2617 I have converted http://svn.apache.org/repos/asf/xalan/test/ to https://github.com/plutext/xalan-test I see you ran minitest and smoketest; those work for me as well :-) Do you recall whether you tried the conf and perf tests? With these I get some test failures, even with the published xalan and serializer jars from maven central. The same tests fail for me under Java 8 and 11, on both master and branch xalan-j_2_7_x. FYI, I have converted Xalan to a maven multi module project; minitest and smoketest pass using Java 8 and Java 11 . I have made module-info.java files for it. Once satisfied about the tests, I'll apply selected patches. |
|
I don't think It tried the conf and perf tests. |
|
@nicolas-albert convertigo/convertigo@bd37cff is 7,493 additions. Would you consider re-doing your patch without the noise? thanks! |
|
@plutext I just apply the PR of this conversation, there is your patch : https://github.com/apache/xalan-j/pull/4/commits |
|
Guys, you're losing me here. I only touched 1 file. I don't understand where all the rest of the 7493 additions come from, but I can't imagine that this patch is hard to merge. From the discussion above, it's not entirely clear to me what is happening, but I hope this fix can be merged soon. |
|
I think the confusion comes from your reference to "@nicolas-albert's additional fixes"; I didn't notice he later clarified that there are no additional fixes :-) so sorry for the noise. Regarding likelihood of merging, please see comments at https://issues.apache.org/jira/browse/XALANJ-2419 |
…omment) fix) and for text
Fixed serializer such that it correctly deals with high-surrogate UTF-16 characters. This pull request replaces an earlier one from Daniel Kec, see comments on https://issues.apache.org/jira/browse/XALANJ-2617.