Skip to content

Harden NFD write-normalization against dropped fields#2322

Merged
myieye merged 7 commits into
developfrom
nfd-normalize-test-coverage
Jun 2, 2026
Merged

Harden NFD write-normalization against dropped fields#2322
myieye merged 7 commits into
developfrom
nfd-normalize-test-coverage

Conversation

@myieye

@myieye myieye commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Hardens the NFD write-normalization path from #2146 so a model field can't silently slip through dropped or un-normalized.

The write-normalization wrapper rebuilt each model with new X { ... } initializers, re-listing every field — so a field added to a model later but missed there would be reset to its default on write, uncaught unless test data also happened to set it non-default. The seven non-record normalizers now copy-then-overwrite only the text fields (records already get this from with), so non-text fields, including ones added in future, are preserved by each type's Copy() — whose completeness is already enforced by EntityCopyMethodTests. Behaviour is unchanged for every current field, and Publication.Copy() is now strongly typed, dropping two redundant casts.

On the test side: factory data uses non-default scalars and genuinely-decomposable (non-ASCII) NFC content so a dropped or no-op'd field actually fails an assertion; the empty-MultiString negative test is isolated; a vacuous WritingSystem assertion is removed; and NormalizationAssert's per-leaf check is now a single named StringCheck (failure label + validator) instead of a threaded NormalizationForm.

Build clean for MiniLcm/LcmCrdt; WriteNormalizationTests, NormalizationAssertTests, and EntityCopyMethodTests pass.

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors the Unicode NFD normalization strategy and test infrastructure. The wrapper now uses Copy()-and-overwrite to preserve all model fields automatically instead of explicit object construction. Test assertions are unified via a StringCheck abstraction that replaces form-specific helpers with decomposed/decomposable validators. All entity tests and data factories are updated accordingly.

Changes

NFD Normalization Strategy and Test Infrastructure

Layer / File(s) Summary
Assertion framework refactoring: StringCheck abstraction
backend/FwLite/MiniLcm.Tests/Helpers/NormalizationAssert.cs
NormalizationAssert refactored from NormalizationForm-based visitor to a unified StringCheck (label + validator) abstraction. New public assertions: AssertAllDecomposed (NFD validation), AssertAllDecomposable (non-trivial NFC), AssertNormalizedToNfd (both normalization and equivalence). Old form-specific helpers and error formatting removed. New NormalizationEquivalency.NormalizedStrings<T>() enables FluentAssertions comparisons with FormD-normalized string fields.
Model interface and Copy() pattern enablement
backend/FwLite/MiniLcm/Models/Publication.cs, backend/FwLite/MiniLcm/Models/Entry.cs
Publication now implements IObjectWithId<Publication> and Copy() returns concrete type instead of interface. Entry.Copy() removes explicit cast on Publication.Copy() results, enabling type-safe Copy-based wrapper refactoring.
Wrapper Copy()-and-overwrite normalization strategy
backend/FwLite/MiniLcm/Normalization/MiniLcmApiWriteNormalizationWrapper.cs
Each normalizer now starts from model Copy() and overwrites only text-bearing fields (Name, Code, Definition, Gloss, Sentence, Text, etc.), automatically preserving all other fields including newly added ones. Applied across PartOfSpeech, Publication, SemanticDomain, MorphType, Sense, ExampleSentence, Translation. Design notes updated to document Copy()-and-overwrite approach and test enforcement of Copy() completeness.
Test data factory enrichment for assertion coverage
backend/FwLite/MiniLcm.Tests/Helpers/NfcTestData.cs
NfcTestData factories enhanced to populate non-default scalar/id fields for assertion coverage: Predefined=true for PartOfSpeech/SemanticDomain; Order and SenseId for ExampleSentence; Order, EntryId, PartOfSpeechId for Sense; reused PartOfSpeech in SenseWithExamples; Order and foreign keys for ComplexFormComponent; HomographNumber=3 and MorphType=Root for Entry variants.
Entity normalization test suite migration to new assertions
backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs
WriteNormalizationTests updated across all entity types to validate captured inputs via AssertNormalizedToNfd instead of AssertAllNfd. Bulk-create operations switch to AssertAllDecomposed with strict ordering. JSON Patch tests replace AssertAllPatchValuesNfd with AssertAllPatchValuesDecomposed. MorphType patch validates MultiString.Subject decomposed structure.
Assertion validation and decomposed/decomposable test coverage
backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs
NormalizationAssertTests rewritten to validate new assertion behavior: AllNfcFactories_ProduceNfcData renamed to AllNfcFactories_ProduceDecomposableData; new tests assert decomposed data passes, composed data throws, and decomposable checks fail for empty MultiString, nested NFD values, and ASCII/trivially-decomposed shapes with targeted property-path error messages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes


Possibly related PRs


Suggested reviewers

  • rmunn
  • imnasnainaec

Poem

🐰 The wrapper hops with Copy and grace,
No fields dropped in this normalization space,
StringCheck validates with decomposed care,
All entities preserved, their data laid bare!
Tests assert what forms should be,
Unicode NFD, now tests agree!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: hardening NFD write-normalization to prevent dropped fields.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description accurately describes the changes: hardening NFD write-normalization to prevent silent field drops, refactoring normalizers to use Copy()-then-overwrite, and improving test coverage with non-default values and decomposable content.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch nfd-normalize-test-coverage

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Jun 1, 2026
myieye and others added 4 commits June 1, 2026 15:49
Previously NormalizationAssert only verified strings were in the
expected form. ASCII satisfies both NFC and NFD, so a factory or
test fixture using ASCII content silently bypassed the normalizer —
the assertion passed without exercising any normalization work.

New `requireNonTrivial` parameter asserts each string differs from
its conversion to the opposite form, ensuring the input actually
contains content the normalizer would have to act on. Applied to
`AllNfcFactories_ProduceNfcData`, which now enforces non-triviality
for every NfcTestData factory in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add BeEquivalentTo(input) with an NFC≡NFD string comparer alongside
the existing AssertAllNfd(captured). The pure form-check ignores
primitives, enums, Guids, and dates, so a normalizer that drops a
non-string field (the HomographNumber regression) passes silently —
this commit closes that hole.

- New `NormalizedStrings()` equivalency option treats NFC-equal-to-NFD
  as equal so the assertion accepts the normalizer's string changes
  while requiring everything else to match exactly.
- New `AssertNormalized` helper combines AssertAllNfd + BeEquivalentTo.
- NfcTestData factories now populate non-default values for non-string
  scalars (HomographNumber, Order, EntryId, etc.) so the identity
  assertion has something to compare. A default-valued field can't
  catch a normalizer that drops it.

Verified by reintroducing the Sense.Order drop: previously 0 tests
caught it; now 5 fail with `Expected property Senses[0].Order to be
1.5, but found 0`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The write-normalization wrapper rebuilt each model with object initializers, duplicating the field list, so a field added to a model later but missed here would be silently reset to default on write. The seven non-record normalizers now copy-then-overwrite text (records already get this from with-expressions); non-text fields, including future ones, ride along via the per-type Copy() method, whose completeness is enforced by EntityCopyMethodTests. Behaviour is unchanged for every current field.

Also make Publication.Copy() return Publication instead of IObjectWithId (it implemented only the non-generic IPossibility), dropping two now-redundant casts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove a vacuous AssertAllDecomposable(WritingSystem) call (all its WS string fields are skipped, so it asserted nothing); isolate the empty-MultiString negative test so it cannot pass on an unrelated "no values" message; and replace the NormalizationAssert (form, requireNonTrivial) flag pair, two params encoding one bit threaded through five methods, with a single named leaf-check delegate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@myieye myieye force-pushed the nfd-normalize-test-coverage branch from 6b0024d to f91c0a8 Compare June 1, 2026 13:51
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

UI unit Tests

  1 files   59 suites   28s ⏱️
177 tests 177 ✅ 0 💤 0 ❌
246 runs  246 ✅ 0 💤 0 ❌

Results for commit 30ac45e.

♻️ This comment has been updated with latest results.

@argos-ci

argos-ci Bot commented Jun 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ✅ No changes detected - Jun 2, 2026, 9:16 AM

myieye and others added 3 commits June 1, 2026 17:35
Comments now describe the guard as catching data byte-identical in NFC and NFD (ASCII being one example), not ASCII specifically. Rename AssertNormalized to AssertNormalizedToNfd since it is NFD-specific.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the (description, Action<string, string, List<string>>) pair threaded through the traversal — two loosely coupled params a caller could mismatch — with a single StringCheck record bundling the failure-header label and a validator that maps a string to an issue message or null. Removes the mutable-list delegate and the mismatch risk; behaviour and failure messages are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "ASCII bypasses the normalizer" reason was stated three times: in
the AssertAllDecomposable docstring, again above the validators, and in
the DecomposableNfc failure message. Keep the docstring and the failure
message; remove the redundant block comment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

C# FwHeadless Unit Tests

48 tests   48 ✅  16s ⏱️
 5 suites   0 💤
 1 files     0 ❌

Results for commit 30ac45e.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/FwLite/MiniLcm/Normalization/MiniLcmApiWriteNormalizationWrapper.cs`:
- Around line 37-39: EntityCopyMethodTests currently enumerates object types
from LcmCrdtKernel.ConfigureCrdt and misses Translation, so Translation.Copy()
completeness isn't enforced while
MiniLcmApiWriteNormalizationWrapper.NormalizeTranslation relies on
translation.Copy(); add coverage by either registering Translation in
LcmCrdtKernel.ConfigureCrdt's .Add<Translation>() list used by
EntityCopyMethodTests or add a focused unit test that calls Translation.Copy()
and asserts field preservation (i.e., create a Translation instance with all
fields set, call Copy(), and verify no text-bearing fields are lost) so
Translation.Copy() is exercised alongside the other Copy()-based normalizers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 886db3be-e4e5-44e7-9f45-457cadd8d3c3

📥 Commits

Reviewing files that changed from the base of the PR and between 5e7c6e4 and 30ac45e.

📒 Files selected for processing (6)
  • backend/FwLite/MiniLcm.Tests/Helpers/NfcTestData.cs
  • backend/FwLite/MiniLcm.Tests/Helpers/NormalizationAssert.cs
  • backend/FwLite/MiniLcm.Tests/WriteNormalizationTests.cs
  • backend/FwLite/MiniLcm/Models/Entry.cs
  • backend/FwLite/MiniLcm/Models/Publication.cs
  • backend/FwLite/MiniLcm/Normalization/MiniLcmApiWriteNormalizationWrapper.cs

@myieye myieye merged commit 79a9816 into develop Jun 2, 2026
20 checks passed
@myieye myieye deleted the nfd-normalize-test-coverage branch June 2, 2026 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant