Skip to content

Add data-quality helpers: row schema validation, field extraction, masking#227

Merged
JE-Chen merged 1 commit into
devfrom
feat/data-quality-batch
Jun 19, 2026
Merged

Add data-quality helpers: row schema validation, field extraction, masking#227
JE-Chen merged 1 commit into
devfrom
feat/data-quality-batch

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 19, 2026

Copy link
Copy Markdown
Member

Round-4 multi-agent web-research follow-up, batch 9 — three pure-stdlib data-quality helpers (the gate between ingestion and downstream entry), each wired through all five layers (facade, AC_*, MCP, Script Builder) with headless tests + EN/Zh v19 docs + README sections.

Features (utils/data_quality)

  • Row schema validationvalidate_rows(rows, schema): declarative per-field rules (type/required/regex/min/max/min_len/max_len/allowed/unique); returns {ok, valid, invalid, errors}. AC_validate_rows + ac_validate_rows.
  • Field extractionextract_fields(text, fields, patterns): named regex presets (email/url/ipv4/phone/date_iso/amount/hashtag) + custom patterns. AC_extract_fields + ac_extract_fields.
  • Row maskingmask_rows(rows, rules): redact / hash (SHA-256) / partial (keep last 4) per column before export. AC_mask_rows + ac_mask_rows.

Why

The round-4 data/validation agent verified there's no validation/extraction/masking layer between load_rows/OCR and downstream entry — the #1 data-quality gap. Pure-stdlib (re/hashlib), no new deps.

Verification

  • test/unit_test/headless/test_data_quality_batch.py — 8 tests pass.
  • ruff clean; radon no CC≥C (split _range_error into number/length helpers); bandit clean; import je_auto_control PySide6-free.

@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 84 complexity · 0 duplication

Metric Results
Complexity 84
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit b61f321 into dev Jun 19, 2026
16 checks passed
@JE-Chen JE-Chen deleted the feat/data-quality-batch branch June 19, 2026 04:06
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant