Skip to content

governance/drift_detector: clarify PSI drifted threshold semantics + minor cleanups #55

@Goldokpa

Description

@Goldokpa

Follow-up from review of #53.

Main concern — PSI drifted flag fires on "moderate" drift.

In detect_drift (PSI branch):

drifted=psi >= PSI_STABLE,  # PSI_STABLE = 0.10

Combined with _psi_severity, a PSI of e.g. 0.12 will be marked drifted=True with severity="moderate". Since report.any_drifted is what consumers will gate on (CI, alerts, etc.), this means "moderate" drift triggers alerts — which conflicts with the docstring's canonical guidance that >0.25 is the significant-drift threshold.

The existing tests pass either way because they only assert severity in {"moderate", "severe"} and any_drifted is True.

Proposal:

  • Expose a configurable threshold parameter on detect_drift, e.g. psi_drift_threshold: float = PSI_MODERATE.
  • Default to psi >= PSI_MODERATE so any_drifted only fires on significant drift, while still reporting severity for the moderate range.
  • Update / add tests to lock in both branches (moderate-but-not-drifted, severe-and-drifted).

Smaller items (can fold into the same PR or split out):

  1. KS p-value with ties. The asymptotic Kolmogorov formula assumes continuous distributions; with heavy ties (e.g. constant-vs-constant or coarse discrete data) p-values can be biased. Add a docstring note or short comment in kolmogorov_smirnov calling this out.
  2. Magic constant in KS severity. severe = p_value < (ks_significance / 5) — promote 5 (or the resulting factor) to a named constant like KS_SEVERE_FACTOR = 5 with a brief justification, or expose it as a parameter.
  3. Double-sort micro-opt in KS. kolmogorov_smirnov sorts ref and cur twice each (once via np.searchsorted(np.sort(ref), ...), once implicitly via combined). Hoist ref_sorted = np.sort(ref) / cur_sorted = np.sort(cur) once. Cosmetic only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions