Skip to content

Add config schema updater & docs#2056

Open
brynpickering wants to merge 8 commits intoPyPSA:masterfrom
open-energy-transition:feat/pydantic-model-updater
Open

Add config schema updater & docs#2056
brynpickering wants to merge 8 commits intoPyPSA:masterfrom
open-energy-transition:feat/pydantic-model-updater

Conversation

@brynpickering
Copy link
Contributor

Closes #2014

I spent a while thinking about where best to place the importers that would update the base config whilst not interfering too much with other files (avoiding future merge conflicts + import recursion issues).
They can't go in the snakefile as you need access to them to update the config before you call snakemake.
I settled on an otherwise unused space for __init__.py in the scripts.validation directory.

Changes proposed in this Pull Request

  • Config schema update base class that can be subclassed to update / add config to the existing schema
  • Rearranged validation config scripts to mitigate import recursion errors
  • Added docs to explain how to use it (maybe should come first?)

Checklist

  • I tested my contribution locally and it works as intended.
  • Code and workflow changes are sufficiently documented.
  • Changed dependencies are added to pixi.toml (using pixi add <dependency-name>).
  • Changes in configuration options are added in config/config.default.yaml.
  • Changes in configuration options are documented in doc/configtables/*.csv.
  • For new data sources or versions, these instructions have been followed.
  • A release note doc/release_notes.rst is added.

@fneum fneum requested a review from lkstrp February 13, 2026 07:34
Copy link
Member

@lkstrp lkstrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I settled on an otherwise unused space for init.py in the scripts.validation directory.

Why not rename it to something like config_updates.py? __init__.py is always annoying to switch to.

Works like a charm, with a very clean additive interface. My main concern is that it's now a bit too clean. If you hide away the config updates for each script in ./scripts, it becomes complicated to understand which entries have been updated already and which came from the main config schema. Chained updates are currently not handled, and just the last update survives. Which means just the import order in config_update/ currently __init__.py matters, which is alphabetically. Any ideas how solve this? It could just raise an error if a key was already updated

This is sufficient for both updates to be imported.
Several separate update scripts can exist and be imported into the schema as desired.

.. autoclass:: lib.validation.config._base::ConfigUpdater
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the reference can be removed since it will not help anyone reading this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the base class they will need to subclass, so it seems reasonable to expose the methods they have available to them for that.

@brynpickering
Copy link
Contributor Author

If you hide away the config updates for each script in ./scripts, it becomes complicated to understand which entries have been updated already and which came from the main config schema.

The idea is that these updaters could be anywhere. If using submodules you might be pulling in an updater from that submodule. If using a soft-fork you can choose where to put your updater. The source of truth of which updaters are being used will then be listed in scripts.lib.validators.__init__.py / scripts.lib.validators.config_updates.py

Chained updates are currently not handled, and just the last update survives.

They are handled just fine. They will be handled in the order they are imported in __init__.py (soon to be config_updates.py). They will always be applied to the latest state of the config, after applying all previous updates.

Which means just the import order in config_update/ currently init.py matters, which is alphabetically.

That's true, chained updates are handled fine but fixing the order they appear in config_updates.py is necessary. This can be achieved by ruff skipping isort in that file.

Why not rename it to something like config_updates.py? init.py is always annoying to switch to.

Sure. We'd just have to import config_update.py in that __init__.py to force the import resolution.

@lkstrp
Copy link
Member

lkstrp commented Feb 13, 2026

What about

Any ideas how solve this? It could just raise an error if a key was already updated

An already updated config entry could just be disallowed to get updated again

@euronion
Copy link
Contributor

What about

Any ideas how solve this? It could just raise an error if a key was already updated

An already updated config entry could just be disallowed to get updated again

What would we do in forks of forks of forks in that case?

@brynpickering
Copy link
Contributor Author

brynpickering commented Feb 13, 2026

@lkstrp I don't see that it's an issue if you update it again. You can get the state of the config item and append to it so that you are merging the two:

class ClusteringConfigUpdater(ConfigUpdater):
    """
    Update an existing config item (in this case, to add `foobar` as an option to the `clustering.mode` key).
    """

    NAME: str = "default.clustering.yaml"  # Currently has no effect

    def update(self) -> type[ConfigSchema]:
        # this will represent the version of clustering schema including any previous changes
        clustering_config = self.base_config().clustering.__class__ 
        current_description = clustering_config.model_fields["mode"].description or ""

        new_description = current_description + " (extra) foobar: new item."
        new_list = Literal[clustering_config.model_fields["mode"].annotation, "foobaz"]
        # You could feasibly add a custom validator at this point, too,
        # using `new_list = typing.Annotated[new_list, pydantic.AfterValidator(func)]`

        clustering_schema = self._apply_updates(
            __base__=clustering_config,
            mode=(new_list, Field("busmap", description=new_description)),
        )
        new_schema = self._apply_updates(
            clustering=(clustering_schema, Field(default_factory=clustering_schema))
        )
        return new_schema

As a user you might not want to merge if someone has already changed it, just in case:

class ClusteringConfigUpdater(ConfigUpdater):
    """
    Update an existing config item (in this case, to add `foobar` as an option to the `clustering.mode` key).
    """

    NAME: str = "default.clustering.yaml"  # Currently has no effect

    def update(self) -> type[ConfigSchema]:
        # this will represent the version of clustering schema including any previous changes
        clustering_config = self.base_config().clustering.__class__ 
        if clustering_config != ClusteringConfig:
            raise ValueError(
                "Expected clustering config to be of type ClusteringConfig, but got "
                f"{clustering_config}. This likely means another config updater has already modified the clustering config in an unexpected way."
            )
        current_description = clustering_config.model_fields["mode"].description or ""

        new_description = current_description + " (extra) foobar: new item."
        new_list = Literal[clustering_config.model_fields["mode"].annotation, "foobaz"]
        # You could feasibly add a custom validator at this point, too,
        # using `new_list = typing.Annotated[new_list, pydantic.AfterValidator(func)]`

        clustering_schema = self._apply_updates(
            __base__=clustering_config,
            mode=(new_list, Field("busmap", description=new_description)),
        )
        new_schema = self._apply_updates(
            clustering=(clustering_schema, Field(default_factory=clustering_schema))
        )
        return new_schema

Then pydantic will raise an error when building the schema, giving that error message.

So, it is user-configurable.

@brynpickering
Copy link
Contributor Author

I've added examples of how that check can be added to a user's updater method.

Copy link
Member

@lkstrp lkstrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, sounds good! One more comment, but looks good to me


@property
@abstractmethod
def NAME(self) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have plans here, fine to leave it in. But this should be lowercase? I see you wanna define a class attribute constant per subclass, but this is not a constant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lkstrp I realised if I didn't do something with it now then I never would. The name is now used when generating the config (and schema). This allows a parallel default config to be generated for a project, again to avoid conflicts with files that are pulled in from upstream in soft forks.

@brynpickering brynpickering requested a review from lkstrp February 13, 2026 14:23
@brynpickering
Copy link
Contributor Author

@lkstrp requesting re-review as I decided to implement a use for the name property and don't want to just sneak it through with your pre-approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make pydantic model easily extendable with a configurable plugin option

3 participants