Skip to content

feat(config): recursively convert parsed dicts to typed dataclasses in loader#5269

Open
MikeGoldsmith wants to merge 12 commits into
open-telemetry:mainfrom
MikeGoldsmith:mike/config-recursive-dict-conversion
Open

feat(config): recursively convert parsed dicts to typed dataclasses in loader#5269
MikeGoldsmith wants to merge 12 commits into
open-telemetry:mainfrom
MikeGoldsmith:mike/config-recursive-dict-conversion

Conversation

@MikeGoldsmith

Copy link
Copy Markdown
Member

Description

Closes the gap between load_config_file() and the factory functions: YAML/JSON config → SDK objects now works end-to-end through the typed model tree.

Previously, the loader's _dict_to_model did OpenTelemetryConfiguration(**data) which only constructed the top-level dataclass — nested fields stayed as raw dicts. This meant factory functions like create_tracer_provider(config: TracerProviderConfig) would break trying to access config.sampler as an attribute when it was actually a dict.

Approach

Added _dict_to_dataclass in a new _conversion.py module. It walks each field's type annotation via typing.get_type_hints and recursively converts:

  • Nested dicts → typed dataclass instances (e.g. dictTracerProviderSpanProcessorBatchSpanProcessor → ...)
  • Lists of dicts → lists of typed dataclasses (list[SpanProcessor])
  • String/value → Enum members (e.g. log_level: infoSeverityNumber.info)
  • Unknown keys → captured by the @_additional_properties decorator (so user-defined plugin names still flow through)

Optional[X] / X | None is unwrapped before checking the inner type. ClassVar fields are skipped (the additional_properties annotation on decorated classes is correctly ignored).

Verified end-to-end

yaml_data = '''
file_format: '1.0-rc.1'
tracer_provider:
  processors:
    - simple:
        exporter:
          console: {}
  sampler:
    parent_based:
      root:
        trace_id_ratio_based: {ratio: 0.5}
  limits:
    attribute_count_limit: 10
'''
config = _dict_to_model(yaml.safe_load(yaml_data))
provider = create_tracer_provider(config.tracer_provider)
# → TracerProvider with ParentBased sampler, 10 attribute limit, etc.

User-defined plugins continue to work — unknown sampler/propagator/exporter names land in additional_properties and are loaded via entry points.

Tests

11 new tests in test_conversion.py covering: flat dicts, nested dataclasses, lists, optionals, missing fields, unknown keys (additional_properties), enum coercion, primitive pass-through.

Closes #5127

Adds `_dict_to_dataclass` in `_conversion.py` which walks each field's
type annotation and converts:
- nested dicts → typed dataclass instances
- lists of dicts → lists of typed dataclasses
- string/value → Enum members (e.g. log_level: info)
- unknown keys → routed to the @_additional_properties decorator

The loader's `_dict_to_model` now produces a fully-typed
OpenTelemetryConfiguration tree end-to-end. Factory functions can rely
on typed attribute access (config.tracer_provider.processors[0].batch
.exporter.otlp_http.endpoint) instead of failing on raw dicts.

This closes the gap between load_config_file() and the factory
functions — YAML/JSON config → SDK objects now works end-to-end.

Closes open-telemetry#5127

Assisted-by: Claude Opus 4.6
@MikeGoldsmith MikeGoldsmith requested a review from a team as a code owner June 3, 2026 09:55
- Use TypeVar for _dict_to_dataclass return — callers now get the
  correct type instead of Any
- Use collections.abc.Mapping for input (more permissive than dict)
- Add explicit is_dataclass check at entry — raises TypeError with a
  descriptive message instead of failing later in dataclasses.fields

Assisted-by: Claude Opus 4.6
Astroid 3.x (used by pylint 3.x) follows typing.get_type_hints into
Python 3.14's annotationlib, which contains t-string literals it can't
parse and crashes with AttributeError on 'visit_templatestr'. Wrapping
the call in a helper that returns dict[str, Any] stops the inference at
the declared return type.

Assisted-by: Claude Opus 4.7
Same effect as the prior helper — declaring the local as ``dict[str, Any]``
stops astroid's inference at the annotation rather than tracing into the
typing internals.

Assisted-by: Claude Opus 4.7
… codespell

Replace the bespoke _Level enum (which violated pylint's invalid-name on
lowercase members) with the real ExemplarFilter enum from models.py — the
generated models use lowercase values verbatim from the JSON schema, so
using one of them avoids fighting the linter and exercises the same code
path with real data shapes.

Add 'astroid' to codespell's ignore-words-list; the prior commit's
explanatory comment mentions the library by name and codespell flagged it
as a misspelling of 'asteroid'.

Assisted-by: Claude Opus 4.7
@MikeGoldsmith MikeGoldsmith marked this pull request as draft June 3, 2026 14:29
@github-project-automation github-project-automation Bot moved this from Ready for review to Approved PRs in Python PR digest Jun 3, 2026
@DylanRussell

Copy link
Copy Markdown
Contributor

Looks good.. should we have a full e2e test like the one you described in your comment ? That seems useful

@MikeGoldsmith MikeGoldsmith marked this pull request as ready for review June 4, 2026 18:37
The conversion module has unit tests that exercise _dict_to_dataclass
in isolation, but nothing verified the full pipeline: load a real
YAML file, get back fully-typed nested dataclasses, and feed the
result into a downstream factory function.

Adds two checks built on a representative nested fixture (tracer
provider with a parent-based / trace-id-ratio sampler and a batch
processor with console exporter):

  - nested fields (sampler, processors[*].batch) come back as the
    expected typed dataclasses, not raw dicts
  - the typed result is accepted by ``create_tracer_provider`` and
    produces an SDK ``TracerProvider``

This is the integration coverage requested in PR review feedback;
the inline example in the PR description is now an actual regression
test.

Assisted-by: Claude Opus 4.7
Comment thread opentelemetry-sdk/tests/_configuration/file/test_loader.py
Reach into the SDK private fields the same way other tests in this file
already do (pylint protected-access disabled at the class level on the
similar test_meter_provider.py) so we confirm the YAML actually flowed
through into the right Sampler/SpanProcessor/Exporter structure rather
than just landing as some TracerProvider instance.

Addresses Dylan's nit on open-telemetry#5269.

Assisted-by: Claude Opus 4.7 (1M context)
…rsion' into mike/config-recursive-dict-conversion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Approved PRs

Development

Successfully merging this pull request may close these issues.

feat(config): convert factory functions to accept raw dicts from YAML loader

3 participants