Skip to content

Conversation

@nicoddemus
Copy link
Member

Convert all the values of SubtestContext.kwargs to strings using saferepr.

This complies with the requirement that the returned dict from pytest_report_to_serializable is serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON.

Fixes pytest-dev/pytest-xdist#1273

Convert all the values of `SubtestContext.kwargs` to strings using `saferepr`.

This complies with the requirement that the returned dict from `pytest_report_to_serializable` is serializable to JSON, at the cost of losing type information for objects that are natively supported by JSON.

Fixes pytest-dev/pytest-xdist#1273
@nicoddemus nicoddemus force-pushed the xdist-subtests-serialization branch from a3cf834 to 7585f45 Compare November 13, 2025 22:35
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided (automation) changelog entry is part of PR label Nov 13, 2025
@nicoddemus nicoddemus added the backport 9.0.x apply to PRs at any point; backports the changes to the 9.0.x branch label Nov 13, 2025
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments for your consideration.

return dataclasses.asdict(self)
result = dataclasses.asdict(self)
# Brute-force the returned kwargs dict to be JSON serializable (pytest-dev/pytest-xdist#1273).
result["kwargs"] = {k: saferepr(v) for (k, v) in result["kwargs"].items()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should strive to have the property that from_json(to_json(report)) == report. In this case it means that we should have self.kwargs itself should be serializable, not just in to_json.

Copy link
Member Author

@nicoddemus nicoddemus Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should strive to have the property that from_json(to_json(report)) == report.

I agree this is desirable, but I don't see how we can accomplish this in a general manner, for any type of object.

In unittest subtests, kwargs is permitted to contain any type of object, because it is not serialized to anything and used only for reporting.

I don't see how we can support that in functions that serialize to JSON.

Perhaps I'm missing something, could you elaborate a bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm thinking is that kwargs itself contain strings, not the objects themselves, that would of course make it serializable. It is not exposed in the API so should be fine? Unless the original objects are needed for some reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well once they are serialized and come back unserialized on the other side, unfortunately they might not be useful depending on the intent. For reporting this should be fine, but if a plugin is using the serialization hooks and expecting the actual objects to be returned this would be a problem.

We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.), but for other objects I don't see another solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can perhaps try to keep the concrete types for objects we know are safe (int, float, etc.)

Implemented that, take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I defer to your judgment on the representation.


However, I still think that from_json(to_json(report)) == report is important.

Well once they are serialized and come back unserialized on the other side, unfortunately they might not be useful depending on the intent. For reporting this should be fine, but if a plugin is using the serialization hooks and expecting the actual objects to be returned this would be a problem.

If the report itself keeps the kwargs as dict[str, json-serializable] (i.e. performs the conversion in init, not in to_serializable), then the property holds. Is there a need to keep the "raw" objects?

Copy link
Member Author

@nicoddemus nicoddemus Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the report itself keeps the kwargs as dict[str, json-serializable] (i.e. performs the conversion in init, not in to_serializable), then the property holds. Is there a need to keep the "raw" objects?

I think we should keep the original objects, because doesn't seem right to me for us to arbitrarily convert the values to something else during __init__ just on the off-chance to_serializable might be called eventually. We lose data this way. Nothing prevents a custom plugin to expect the report objects to be intact in their own pytest_runtest_logreport implementation.

But something occurred to me: there's nothing in pytest_report_to_serializable that says the data returned needs to be JSON-native, only JSON-compatible. This means we can convert the data in the report in whenever format we want, as long as the resulting data can be converted into JSON.

With this in mind, I think we can simply use pickle to convert the kwargs into a str, which we can then easily deserialize during pytest_report_from_serializable. This will hold the from_serializable(to_serializable(report)) == report property, and should support most data types. pickle has its problems, but is the standard way to serialize/deserialize data in Python.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, please take a look.

One downside that did not occur to me is that local objects (such as MyEnum used in the test) normally cannot be pickled. I don't think this will be a problem in production, however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We lose data this way. Nothing prevents a custom plugin to expect the report objects to be intact in their own pytest_runtest_logreport implementation.

I somewhat think that an arbitrary Python value would be harder for a plugin to handle than a JSON-native value. AFAIK the main/intended use case for the pytest_runtest_logreport hook is to report the outcome to some other place (terminal, junitxml, etc.) and so what the plugin mostly cares about is the nodeid and outcome and metadata. In case of subtests, I'm not sure how many plugins would want to do something special with the kwargs formatting beyond repring it.

Regarding pickle, I never really used it, but I assumed it wasn't used in pytest for a reason. E.g. execnet doesn't use it for serialization, and pytest also went with json-like for report serialization. Maybe because it's Python specific, or binary, or has multiple versions, or not secure? For this reason my vague sense is that we should avoid it but I don't have a concrete argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat think that an arbitrary Python value would be harder for a plugin to handle than a JSON-native value.

Plugins are supposed to be "looking" at the value returned by pytest_report_to_serializable, the value is only meant to be JSON serializable so plugins have some guidance on how it is supposed to be implemented. Ultimately, plugins are expected to pass the returned value over to pytest_report_from_serializable and then use the returned object.

AFAIK the main/intended use case for the pytest_runtest_logreport hook is to report the outcome to some other place (terminal, junitxml, etc.) and so what the plugin mostly cares about is the nodeid and outcome and metadata. In case of subtests, I'm not sure how many plugins would want to do something special with the kwargs formatting beyond repring it.

Perhaps, but I wouldn't just assume that: there is nothing in the docs or the design that says that we will convert the kwargs using saferepr (and lose the original information). I can envision a plugin saving that data to some other logging system, and attempting to recover it later using pytest_report_from_serializable. If we convert things using saferepr from under it we lose the original metadata.

Regarding pickle, I never really used it, but I assumed it wasn't used in pytest for a reason. E.g. execnet doesn't use it for serialization, and pytest also went with json-like for report serialization. Maybe because it's Python specific, or binary, or has multiple versions, or not secure? For this reason my vague sense is that we should avoid it but I don't have a concrete argument.

The reasons are:

  • It is Python specific; but here we are ultimately implementing hooks in Python, so I don't see this as an issue.
  • It has multiple versions, and does not play well with Python 2 -> 3 objects; I believe this was the main motivator for execnet using a custom system.
  • It is not secure, because it can execute arbitrary code (like __init__ of some classes if they are pickled, __setstate__, etc). But ultimately we are executing custom Python code anyway directly (the hooks), so this point is moot (we are already executing arbitrary code).

I think we are running in circles here:

  • We want from_json(to_json(kwargs)) == kwargs and I agree with that.
  • You seem to suggest to use saferepr when creating* the SubTestReport, which I disagree with: we shouldn't be changing the data like that just because eventually we might call a separate hook for serialization.

Perhaps @RonnyPfannschmidt @Pierre-Sassoulas @The-Compiler want to chime in and have other suggestions or share their opinions on what a correct approach would look like?

@nicoddemus nicoddemus force-pushed the xdist-subtests-serialization branch 2 times, most recently from 53f50a5 to f8da2cb Compare November 22, 2025 12:23
@nicoddemus nicoddemus force-pushed the xdist-subtests-serialization branch from f8da2cb to fdfa22f Compare November 22, 2025 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 9.0.x apply to PRs at any point; backports the changes to the 9.0.x branch bot:chronographer:provided (automation) changelog entry is part of PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

execnet.gateway_base.DumpError: can't serialize <enum 'SortBy'> with pytest==9.0.0 and while using pytest-xdist

2 participants