Skip to content

Commit 25c8b75

Browse files
Update docs.
1 parent 445c125 commit 25c8b75

1 file changed

Lines changed: 18 additions & 8 deletions

File tree

Doc/library/pyexpat.rst

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -64,11 +64,17 @@ The :mod:`!xml.parsers.expat` module contains two functions:
6464
.. function:: ParserCreate(encoding=None, namespace_separator=None)
6565

6666
Creates and returns a new :class:`xmlparser` object. *encoding*, if specified,
67-
must be a string naming the encoding used by the XML data. Expat doesn't
68-
support as many encodings as Python does, and its repertoire of encodings can't
69-
be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If
70-
*encoding* [1]_ is given it will override the implicit or explicit encoding of the
71-
document.
67+
must be a string naming the encoding used by the XML data.
68+
Expat natively understands and processes UTF-8, UTF-16, UTF-16BE, UTF-16LE,
69+
ISO-8859-1, and US-ASCII.
70+
For other encodings (including aliases like Latin1 and ASCII) it falls back
71+
to Python.
72+
It supports most of 8-bit encodings and many multi-byte encodings like
73+
Shift_JIS, although only BMP characters (``U+0000-U+FFFF``) are supported
74+
with non-native encodings (this restriction is also applied to aliases
75+
like UTF8).
76+
If *encoding* [1]_ is given it will override the implicit or explicit
77+
encoding of the document and the restrictions mentioned above will not apply.
7278

7379
.. _xmlparser-non-root:
7480

@@ -113,6 +119,8 @@ The :mod:`!xml.parsers.expat` module contains two functions:
113119
XML document. Call ``ParserCreate`` for each document to provide unique
114120
parser instances.
115121

122+
.. versionchanged:: next
123+
Added support for multi-byte encodings.
116124

117125
.. seealso::
118126

@@ -1083,9 +1091,11 @@ The ``errors`` module has the following attributes:
10831091

10841092
.. rubric:: Footnotes
10851093

1086-
.. [1] The encoding string included in XML output should conform to the
1087-
appropriate standards. For example, "UTF-8" is valid, but "UTF8" is
1088-
not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
1094+
.. [1] The encoding string included in XML output should conform to
1095+
the appropriate standards. For example, "UTF-8" is valid, but
1096+
"UTF8" is not valid in an XML document's declaration, even though
1097+
Python accepts it as an encoding name.
1098+
See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl
10891099
and https://www.iana.org/assignments/character-sets/character-sets.xhtml.
10901100
10911101

0 commit comments

Comments
 (0)