Skip to content

chore(prefect-plugin): upgrade Prefect v2→v3 and clean up pydantic v1 usage#17536

Open
dinesh-verma-datahub wants to merge 3 commits into
masterfrom
prefect-v2-to-v3-upgrade-and-pydantic-v1-cleanup
Open

chore(prefect-plugin): upgrade Prefect v2→v3 and clean up pydantic v1 usage#17536
dinesh-verma-datahub wants to merge 3 commits into
masterfrom
prefect-v2-to-v3-upgrade-and-pydantic-v1-cleanup

Conversation

@dinesh-verma-datahub
Copy link
Copy Markdown
Contributor

@dinesh-verma-datahub dinesh-verma-datahub commented May 21, 2026

Summary

  • Prefect v2 → v3: Updated prefect dependency pin from >=2.0.0,<3.0.0 to >=3.0.0,<4.0.0. Replaced all
    asyncio.run(...) calls with run_coro_as_sync(...). Updated entry point group from prefect.block to
    prefect.collections and module reference from prefect_datahub.prefect_datahub:DatahubEmitter to
    prefect_datahub.datahub_emitter. Fixed _block_type_name to use ClassVar.
  • Task emission fix: The previous code iterated over flow_run_ctx.task_run_futures to build the task key
    map, which was removed in Prefect v3. Replaced with read_task_runs API calls, adding offset pagination
    (Prefect server returns at most 200 results per request) and stable-count polling to handle Prefect's
    asynchronous task run state persistence.
  • Pydantic v1 → v2: Removed pydantic.v1 shim import (from pydantic.v1 import SecretStr → from pydantic
    import SecretStr). Bumped dev dependency from pydantic>=1.10 to pydantic>=2.0.0. Removed
    V1RootValidator/V1Validator type references and dead TYPE_CHECKING guards across 8 files
    (connection_resolver.py, import_resolver.py, validate_field_deprecation.py, validate_field_removal.py,
    validate_field_rename.py, validate_multiline_string.py, snowflake_lineage_v2.py,
    entity_removal_state.py). Replaced FlowRun.parse_obj() and TaskRun.parse_obj() (pydantic v1 methods)
    with direct await client.read_flow_run/read_task_run calls.

@github-actions github-actions Bot added the ingestion PR or Issue related to the ingestion of metadata label May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 91.76471% with 7 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...fect-plugin/src/prefect_datahub/datahub_emitter.py 90.54% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

… usage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dinesh-verma-datahub dinesh-verma-datahub force-pushed the prefect-v2-to-v3-upgrade-and-pydantic-v1-cleanup branch from 57012fd to 56322cc Compare May 21, 2026 06:54
@datahub-connector-tests
Copy link
Copy Markdown

datahub-connector-tests Bot commented May 21, 2026

Connector Tests Results

All connector tests passed for commit a4c1d70

View full test logs →

To skip connector tests, add the skip-connector-tests label (org members only).

Autogenerated by the connector-tests CI pipeline.

@dinesh-verma-datahub dinesh-verma-datahub marked this pull request as ready for review May 22, 2026 07:18
@github-actions
Copy link
Copy Markdown
Contributor

Linear: ING-2733

@github-actions
Copy link
Copy Markdown
Contributor

Your PR has been assigned to @sgomezvillamor (sergio.gomez) for review (ING-2733).

Copy link
Copy Markdown
Contributor

@sgomezvillamor sgomezvillamor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on the Prefect v2→v3 and pydantic v1→v2 migration! The asyncio.run()run_coro_as_sync() approach is exactly right for Prefect v3's event loop model, and replacing task_run_futures with paginated read_task_runs calls is a clean, correct fix. A few items before merge:


🔴 BLOCKER — Breaking change not documented in updating-datahub.md
This PR drops Prefect v2 support (>=2.0.0,<3.0.0>=3.0.0,<4.0.0) and changes the entry point group from prefect.block to prefect.collections. Both are user-facing breaking changes. Please add an entry to docs/how/updating-datahub.md before merge.


🟡 WARNING — Silent timeout in _fetch_all_task_runs_stable
The 50-iteration × 0.1 s polling loop exits silently when exhausted. If Prefect's state persistence is slow, tasks could be silently dropped from DataHub lineage. Please add a warning log when the loop ends without stabilising.


🟡 WARNING — # type: ignore[arg-type] on FlowRunFilterId
flow_run_id is typed as str but FlowRunFilterId.any_ likely expects List[UUID]. Worth fixing the type rather than suppressing — e.g. any_=[UUID(flow_run_id)].


ℹ️ SUGGESTION — import asyncio inside method body
asyncio was removed from the top-level imports but re-introduced locally inside _fetch_all_task_runs_stable. It should stay at the module level per project conventions.


❓ QUESTION — Can -> Any be narrowed for the validator factory functions?
In files like connection_resolver.py, validate_field_*.py, etc. the return type changed from V1RootValidator/V1ValidatorAny. In pydantic v2, model_validator(mode="before")(fn) wraps validators as classmethod descriptors — would -> classmethod be a viable narrower annotation here?


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants