Skip to content

Using decode_io() causes extra closing } to not trigger error #131

@multiplemonomials

Description

@multiplemonomials

I noticed while working on my project that I accidentally commented out a line in a JSON file, and yet... pyjson5 was still loading that JSON file without error in one instance, causing invalid data to enter my script.

I did a little digging, and it looks like using the decode_io() function specifically causes an extra closing } to not trigger a parsing error as it should (and does with decode_utf8()).

Here's a minimal example

Save this as pyjson5_bug.json5:

{
  "ROOT_ELEMENT": {
    //"SOME_ENTRY": {  Oops! Commented this out
      "SOME_SUBKEY": {
        "data": 1
      },
    },
    "SOME_OTHER_ENTRY": {
      "SOME_SUBKEY": {
        "data": 1
      },
    }
  }
}

Then run:

python -c "import pyjson5; import json; decoded = pyjson5.decode_io(open('pyjson5_bug.json5', 'rb')); print(json.dumps(decoded, indent=2))"

I'd expect this to produce an error, but instead it prints

{
  "ROOT_ELEMENT": {
    "SOME_SUBKEY": {
      "data": 1
    }
  },
  "SOME_OTHER_ENTRY": {
    "SOME_SUBKEY": {
      "data": 1
    }
  }
}

which is obviously not the data I wanted!

If I instead use decode_utf8, it works as expected:

$ python -c "import pyjson5; import pathlib; pyjson5.decode_utf8(pathlib.Path('pyjson5_bug.json5').read_bytes())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "src/_exports.pyx", line 120, in pyjson5.pyjson5.decode_utf8
  File "src/_exports.pyx", line 173, in pyjson5.pyjson5.decode_buffer
  File "src/_decoder.pyx", line 902, in pyjson5.pyjson5._decode_buffer
  File "src/_decoder.pyx", line 851, in pyjson5.pyjson5._decode_utf8
  File "src/_decoder.pyx", line 815, in pyjson5.pyjson5._decode_all
pyjson5.pyjson5.Json5ExtraData: ('Extra data U+007D near 186', {'ROOT_ELEMENT': {'SOME_SUBKEY': {'data': 1}}, 'SOME_OTHER_ENTRY': {'SOME_SUBKEY': {'data': 1}}}, '}')

Without looking at the code at all, I would wonder if this is due to decode_io() exiting once it has the expected amount of closing braces. Maybe it should be updated to instead read till EOF, at least by default, as otherwise there's the potential for errors like this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions