-
Notifications
You must be signed in to change notification settings - Fork 12
Description
I noticed while working on my project that I accidentally commented out a line in a JSON file, and yet... pyjson5 was still loading that JSON file without error in one instance, causing invalid data to enter my script.
I did a little digging, and it looks like using the decode_io() function specifically causes an extra closing } to not trigger a parsing error as it should (and does with decode_utf8()).
Here's a minimal example
Save this as pyjson5_bug.json5:
{
"ROOT_ELEMENT": {
//"SOME_ENTRY": { Oops! Commented this out
"SOME_SUBKEY": {
"data": 1
},
},
"SOME_OTHER_ENTRY": {
"SOME_SUBKEY": {
"data": 1
},
}
}
}
Then run:
python -c "import pyjson5; import json; decoded = pyjson5.decode_io(open('pyjson5_bug.json5', 'rb')); print(json.dumps(decoded, indent=2))"
I'd expect this to produce an error, but instead it prints
{
"ROOT_ELEMENT": {
"SOME_SUBKEY": {
"data": 1
}
},
"SOME_OTHER_ENTRY": {
"SOME_SUBKEY": {
"data": 1
}
}
}
which is obviously not the data I wanted!
If I instead use decode_utf8, it works as expected:
$ python -c "import pyjson5; import pathlib; pyjson5.decode_utf8(pathlib.Path('pyjson5_bug.json5').read_bytes())"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "src/_exports.pyx", line 120, in pyjson5.pyjson5.decode_utf8
File "src/_exports.pyx", line 173, in pyjson5.pyjson5.decode_buffer
File "src/_decoder.pyx", line 902, in pyjson5.pyjson5._decode_buffer
File "src/_decoder.pyx", line 851, in pyjson5.pyjson5._decode_utf8
File "src/_decoder.pyx", line 815, in pyjson5.pyjson5._decode_all
pyjson5.pyjson5.Json5ExtraData: ('Extra data U+007D near 186', {'ROOT_ELEMENT': {'SOME_SUBKEY': {'data': 1}}, 'SOME_OTHER_ENTRY': {'SOME_SUBKEY': {'data': 1}}}, '}')
Without looking at the code at all, I would wonder if this is due to decode_io() exiting once it has the expected amount of closing braces. Maybe it should be updated to instead read till EOF, at least by default, as otherwise there's the potential for errors like this.