@@ -64,11 +64,17 @@ The :mod:`!xml.parsers.expat` module contains two functions:
6464.. function :: ParserCreate(encoding=None, namespace_separator=None)
6565
6666 Creates and returns a new :class: `xmlparser ` object. *encoding *, if specified,
67- must be a string naming the encoding used by the XML data. Expat doesn't
68- support as many encodings as Python does, and its repertoire of encodings can't
69- be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If
70- *encoding * [1 ]_ is given it will override the implicit or explicit encoding of the
71- document.
67+ must be a string naming the encoding used by the XML data.
68+ Expat natively understands and processes UTF-8, UTF-16, UTF-16BE, UTF-16LE,
69+ ISO-8859-1, and US-ASCII.
70+ For other encodings (including aliases like Latin1 and ASCII) it falls back
71+ to Python.
72+ It supports most of 8-bit encodings and many multi-byte encodings like
73+ Shift_JIS, although only BMP characters (``U+0000-U+FFFF ``) are supported
74+ with non-native encodings (this restriction is also applied to aliases
75+ like UTF8).
76+ If *encoding * [1 ]_ is given it will override the implicit or explicit
77+ encoding of the document and the restrictions mentioned above will not apply.
7278
7379 .. _xmlparser-non-root :
7480
@@ -113,6 +119,8 @@ The :mod:`!xml.parsers.expat` module contains two functions:
113119 XML document. Call ``ParserCreate `` for each document to provide unique
114120 parser instances.
115121
122+ .. versionchanged :: next
123+ Added support for multi-byte encodings.
116124
117125.. seealso ::
118126
@@ -1083,9 +1091,11 @@ The ``errors`` module has the following attributes:
10831091
10841092.. rubric :: Footnotes
10851093
1086- .. [1 ] The encoding string included in XML output should conform to the
1087- appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1088- not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
1094+ .. [1 ] The encoding string included in XML output should conform to
1095+ the appropriate standards. For example, "UTF-8" is valid, but
1096+ "UTF8" is not valid in an XML document's declaration, even though
1097+ Python accepts it as an encoding name.
1098+ See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
10891099 and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
10901100
10911101
0 commit comments